apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.89k stars 6.52k forks source link

[Bug] SIGSEGV: segmentation violation in banyand/query.(*topNQueryProcessor).Rev.func1() #12219

Closed Almot77 closed 5 months ago

Almot77 commented 6 months ago

Search before asking

Apache SkyWalking Component

BanyanDB (apache/skywalking-banyandb)

What happened

Server crash on segmentation fault in docker. I use :latest SW and BYDB images (new releases).

What you expected to happen

{"level":"info","module":"STREAM-SEGMENT.SCHEDULER.RETENTION","name":"retention","now":"2024-05-13T14:49:52Z","time":"2024-05-13T14:49:52Z","message":"start"}
{"level":"info","module":"STREAM","group":"stream-browser_error_log","time":"2024-05-13T14:49:52Z","message":"creating a tsdb"}
{"level":"info","module":"STREAM-BROWSER_ERROR_LOG","path":"/tmp/stream-data/stream/stream-browser_error_log","time":"2024-05-13T14:49:52Z","message":"initialized"}
{"level":"info","module":"STREAM-BROWSER_ERROR_LOG.SCHEDULER.RETENTION","name":"retention","now":"2024-05-13T14:49:52Z","time":"2024-05-13T14:49:52Z","message":"start"}
{"level":"info","module":"STREAM","group":"stream-zipkin_span","time":"2024-05-13T14:49:52Z","message":"creating a tsdb"}
{"level":"info","module":"STREAM-ZIPKIN_SPAN","path":"/tmp/stream-data/stream/stream-zipkin_span","time":"2024-05-13T14:49:52Z","message":"initialized"}
{"level":"info","module":"STREAM-ZIPKIN_SPAN.SCHEDULER.RETENTION","name":"retention","now":"2024-05-13T14:49:52Z","time":"2024-05-13T14:49:52Z","message":"start"}
{"level":"error","module":"QUERY.TOPN.MEASURE-MINUTE.ENDPOINT_RESP_TIME_MINUTE_TOPN","error":"failed to query measure: unmarshal tag value: unsupported tag value type","req":{"groups":["measure-minute"], "name":"endpoint_resp_time_minute_topn", "timeRange":{"begin":"2024-05-13T14:20:00Z", "end":"2024-05-13T14:51:00Z"}, "topN":10, "agg":"AGGREGATION_FUNCTION_MEAN", "conditions":[{"name":"service_id", "op":"BINARY_OP_EQ", "value":{"str":{"value":"cGhwLW1zay1sZWdhY3k=.1"}}}], "fieldValueSort":"SORT_DESC"},"time":"2024-05-13T14:50:30Z","message":"fail to close the topn plan"}
panic: runtime error: invalid memory address or nil pointer dereference
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1251163]

goroutine 356 [running]:
github.com/apache/skywalking-banyandb/banyand/query.(*topNQueryProcessor).Rev.func1()
        /src/banyand/query/processor_topn.go:126 +0x23
panic({0x13d6e40?, 0x259ebd0?})
        /usr/local/go/src/runtime/panic.go:770 +0x132
github.com/apache/skywalking-banyandb/banyand/query.(*topNQueryProcessor).Rev(0xc000010fd8, {{0x156fd20, 0xc0085ce280}, {0x15f10a0, 0x5}, 0x17cf13deadbc52eb, 0x0})
        /src/banyand/query/processor_topn.go:133 +0xfd6
github.com/apache/skywalking-banyandb/pkg/bus.(*Bus).Subscribe.func1({0x1a71de0, 0xc000010fd8}, 0xc0001d09c0)
        /src/pkg/bus/bus.go:274 +0xfa
created by github.com/apache/skywalking-banyandb/pkg/bus.(*Bus).Subscribe in goroutine 1
        /src/pkg/bus/bus.go:270 +0x28f

How to reproduce

Docker compose docker compose --profile banyandb up -d

version: '3.8'
services:
  elasticsearch:
    profiles:
      - "elasticsearch"
    image: itbgk/elasticsearch-oss:7.9.2
    container_name: skywalking-elasticsearch
    ports:
      - "9200:9200"
    networks:
      - skywalking
    volumes:
      - elastic-sw:/usr/share/elasticsearch/data
    healthcheck:
      test: [ "CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    restart: always
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1

  banyandb:
    profiles:
      - "banyandb"
    image: ${BANYANDB_IMAGE:-apache/skywalking-banyandb:latest}
    container_name: banyandb
    restart: always
    networks:
      - skywalking
    expose:
      - 17912
    ports:
      - 17913:17913
    volumes:
      - banyandb-stream-data:/tmp/stream-data
      - banyandb-measure-data:/tmp/measure-data

    command: standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data
    healthcheck:
      test: [ "CMD", "sh", "-c", "nc -nz 127.0.0.1 17912" ]
      interval: 5s
      timeout: 60s
      retries: 120

  oap-base: &oap-base
    profiles: [ "none" ]
    image: ${OAP_IMAGE:-ghcr.io/apache/skywalking/oap:latest}
    ports:
      - "11800:11800"
      - "12800:12800"
      - "9099:9090"
      - "3100:3100"
    networks:
      - skywalking
    healthcheck:
      test: [ "CMD-SHELL", "curl http://localhost:12800/internal/l7check" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
#    restart: always
    environment: &oap-env
      TZ: Europe/Moscow
      SW_HEALTH_CHECKER: default
      SW_OTEL_RECEIVER: default
      SW_OTEL_RECEIVER_ENABLED_OC_RULES: vm
      SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES: vm
      SW_TELEMETRY: prometheus
      JAVA_OPTS: "-Xms2048m -Xmx2048m"
      SW_CORE_RECORD_DATA_TTL: 2 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 2
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

  oap-es:
    <<: *oap-base
    profiles:
      - "elasticsearch"
    container_name: skywalking-server # rename to something else if switching to BanyanDB
    depends_on:
      elasticsearch:
        condition: service_healthy
    environment:
      <<: *oap-env
      SW_STORAGE: elasticsearch
      SW_STORAGE_ES_CLUSTER_NODES: elasticsearch:9200
      SW_CORE_RECORD_DATA_TTL: 2 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 2
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

  oap-bdb:
    <<: *oap-base
    profiles:
      - "banyandb"
    container_name: skywalking-server-bdb # rename to oap if switching to Elasticsearch
    depends_on:
      banyandb:
        condition: service_healthy
    environment:
      <<: *oap-env
      SW_STORAGE: banyandb
      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
      SW_CORE_RECORD_DATA_TTL: 14 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 14
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

  ui:
    image: ${UI_IMAGE:-ghcr.io/apache/skywalking/ui:latest}
    container_name: skywalking-ui
    ports:
      - "1010:8080"
    networks:
      - skywalking
    restart: always
    environment:
      <<: *oap-env
      SW_OAP_ADDRESS: http://skywalking-server-bdb:12800
      SW_ZIPKIN_ADDRESS: http://skywalking-server-bdb:9412

volumes:
  elastic-sw:
  banyandb-stream-data:
    external: true
  banyandb-measure-data:
    external: true

networks:
  skywalking:

Anything else

No response

Are you willing to submit a pull request to fix on your own?

Code of Conduct

wu-sheng commented 6 months ago

Your configuration is not well formatted. Please correct them. And what does SW_STORAGE: elasticsearch mean? I think we don't need Elasticsearch when you use BanyanDB.

And we don't have banyandb-helm 0.2 release, how do you deploy the database?

lujiajing1126 commented 6 months ago

After checking the code, it seems error is not handled properly.

image
wu-sheng commented 6 months ago

@lujiajing1126 What is the case of occurring this error?

Almot77 commented 6 months ago

Your configuration is not well formatted. Please correct them. And what does SW_STORAGE: elasticsearch mean? I think we don't need Elasticsearch when you use BanyanDB.

And we don't have banyandb-helm 0.2 release, how do you deploy the database?

s/bus.go:270 +

It`s docker-compose.yml file. I run SW with selected db profile: elastic or banyandb.

Right way to run it: docker compose --profile banyandb up -d

wu-sheng commented 6 months ago

Are you using docker quick start? We haven't upgraded it to latest. It needs v10 oap and latest banyandb 0.6.

wu-sheng commented 6 months ago

This error is easy to fix, @Almot77 but we want to know how you could trigger it, as we have run many tests to verify features.

hanahmily commented 6 months ago

The nil error is fixed by https://github.com/apache/skywalking-banyandb/pull/445/files#diff-695073ea8dec3fcdaae77a3fcfb4eabc7290daade34399aee1de429999d7b476R124

But the error below is a bit tricky.

{"level":"error","module":"QUERY.TOPN.MEASURE-MINUTE.ENDPOINT_RESP_TIME_MINUTE_TOPN","error":"failed to query measure: unmarshal tag value: unsupported tag value type","req":{"groups":["measure-minute"], "name":"endpoint_resp_time_minute_topn", "timeRange":{"begin":"2024-05-13T14:20:00Z", "end":"2024-05-13T14:51:00Z"}, "topN":10, "agg":"AGGREGATION_FUNCTION_MEAN", "conditions":[{"name":"service_id", "op":"BINARY_OP_EQ", "value":{"str":{"value":"cGhwLW1zay1sZWdhY3k=.1"}}}], "fieldValueSort":"SORT_DESC"},"time":"2024-05-13T14:50:30Z","message":"fail to close the topn plan"}

@Almot77 Would you pls use the last banyandb image built from https://github.com/apache/skywalking-banyandb/pull/445 to output more context about this error?

If you have an appropriate docker environment, issuing make docker.build is all you need.

wu-sheng commented 6 months ago

I reopened this as we don't know gow this happens.

@Almot77 We will need more, could you package the whole data folder to us, then we could address what is the illegal data. Or could you share how we could reproduce this.

Almot77 commented 6 months ago

Okay, i'l test it now.

We use php application in docker + sw php libay for trace collecting + grafana dashboards (made from sw examples for grafana)

Almot77 commented 6 months ago

How to install go libaries ? Ubuntu 22.04 I do:

sudo apt install time nodejs npm
sudo npm cache clean -f
sudo npm install -g n
sudo n stable
wget https://go.dev/dl/go1.22.3.linux-amd64.tar.gz
sudo  rm -rf /usr/local/go && tar -C /usr/local -xzf go1.22.3.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin

$ make docker.build

make docker -C docker; \
if [ $? -ne 0 ]; then \
        exit 1; \
fi; \

make[1]: Entering directory '/home/srvdocker/skywalking/build/skywalking-banyandb/docker'
Build apache/skywalking-banyandb:latest
[+] Building 1.3s (14/18)                                                                                                                                       docker:default
 => [internal] load .dockerignore                                                                                                                                         0.0s
 => => transferring context: 2B                                                                                                                                           0.0s
 => [internal] load build definition from Dockerfile                                                                                                                      0.0s
 => => transferring dockerfile: 2.03kB                                                                                                                                    0.0s
 => [internal] load metadata for docker.io/library/busybox:stable-glibc                                                                                                   0.5s
 => [internal] load metadata for docker.io/library/alpine:edge                                                                                                            0.5s
 => [internal] load metadata for docker.io/library/golang:1.22                                                                                                            0.5s
 => [base 1/4] FROM docker.io/library/golang:1.22@sha256:b1e05e2c918f52c59d39ce7d5844f73b2f4511f7734add8bb98c9ecdd4443365                                                 0.0s
 => [internal] load build context                                                                                                                                         0.1s
 => => transferring context: 73.62kB                                                                                                                                      0.1s
 => CACHED [build-linux 1/4] FROM docker.io/library/busybox:stable-glibc@sha256:9bc27a72a82d22e54b4cc8bd7b99d3907a442869f77f075e0119104f2404953d                          0.0s
 => [certs 1/2] FROM docker.io/library/alpine:edge@sha256:e31c3b1cd47718260e1b6163af0a05b3c428dc01fa410baf72ca8b8076e22e72                                                0.0s
 => CACHED [certs 2/2] RUN apk add --no-cache ca-certificates && update-ca-certificates                                                                                   0.0s
 => CACHED [base 2/4] WORKDIR /src                                                                                                                                        0.0s
 => CACHED [base 3/4] COPY go.* ./                                                                                                                                        0.0s
 => CACHED [base 4/4] RUN go mod download                                                                                                                                 0.0s
 => ERROR [builder 1/2] RUN --mount=target=.             --mount=type=cache,target=/root/.cache/go-build             BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyan  0.8s
------
 > [builder 1/2] RUN --mount=target=.             --mount=type=cache,target=/root/.cache/go-build             BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyand banyand-server-static:
0.215 make: Entering directory '/src/banyand'
0.233 Building static binary
0.233 CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
0.233         -buildvcs=false \
0.233   -a --ldflags '-X github.com/apache/skywalking-banyandb/pkg/version.build=v0.6.0-1-gc827067-main -extldflags "-static"' -tags "netgo prometheus" -installsuffix netgo \
0.233   -o /out/banyand-server-static github.com/apache/skywalking-banyandb/banyand/cmd/server
0.434 ../api/data/data.go:24:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1
0.434 ../api/data/data.go:25:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/stream/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/stream/v1
0.434 ../pkg/run/run.go:36:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/database/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/database/v1
0.434 dquery/measure.go:25:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/common/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/common/v1
0.434 dquery/dquery.go:27:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/model/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/model/v1
0.434 metadata/schema/checker.go:27:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1
0.434 ../ui/embed.go:25:12: pattern dist: no matching files found
0.434 queue/pub/client.go:29:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/cluster/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/cluster/v1
0.440 make: *** [../scripts/build/build.mk:61: /out/banyand-server-static] Error 1
0.440 make: Leaving directory '/src/banyand'
------
Dockerfile:40
--------------------
  39 |
  40 | >>> RUN --mount=target=. \
  41 | >>>             --mount=type=cache,target=/root/.cache/go-build \
  42 | >>>             BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyand banyand-server-static
  43 |     RUN --mount=target=. \
--------------------
ERROR: failed to solve: process "/bin/sh -c BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyand banyand-server-static" did not complete successfully: exit code: 2
Command exited with non-zero status 1
0.15user 0.15system 0:01.51elapsed 20%CPU (0avgtext+0avgdata 48096maxresident)k
128inputs+56outputs (0major+10330minor)pagefaults 0swaps
make[1]: *** [../scripts/build/docker.mk:46: docker] Error 1
make[1]: Leaving directory '/home/srvdocker/skywalking/build/skywalking-banyandb/docker'
make: *** [Makefile:154: docker.build] Error 1

I try get libs:

$ go get github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1
go: github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1: no matching versions for query "upgrade"

$ go get github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1
go: github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1: no matching versions for query "upgrade"
wu-sheng commented 6 months ago

This is build doc, https://skywalking.apache.org/docs/skywalking-banyandb/latest/installation/binaries/#build-binaries

wu-sheng commented 6 months ago

Or simply, try dev image from here, https://github.com/apache/skywalking-banyandb/pkgs/container/skywalking-banyandb/215721861?tag=c8270670d47a9c6caa2661af434157656c4b7eaf

Almot77 commented 6 months ago

Ok, db start success, but i have 3 troubles:

  1. I lost data from my panel in grafana. Possible bug was in this place.

image

Query to endpoint_sla{parent_service='$service', layer='$layer', top_n='15', order='ASC'} / 100

Grafana query inspect:

{
  "request": {
    "url": "api/ds/query?ds_type=prometheus&requestId=Q362",
    "method": "POST",
    "data": {
      "queries": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fdjzti6mhdam8c"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100",
          "format": "time_series",
          "instant": false,
          "legendFormat": "{{endpoint}}",
          "range": true,
          "refId": "A",
          "requestId": "71A",
          "utcOffsetSec": 10800,
          "interval": "",
          "datasourceId": 2,
          "intervalMs": 60000,
          "maxDataPoints": 1358
        }
      ],
      "from": "1715666087538",
      "to": "1715669687539"
    },
    "hideFromInspector": false
  },
  "response": {
    "results": {
      "A": {
        "error": "expected object type",
        "errorSource": "",
        "status": 200,
        "frames": [
          {
            "schema": {
              "refId": "A",
              "meta": {
                "typeVersion": [
                  0,
                  0
                ],
                "executedQueryString": "Expr: endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100\nStep: 1m0s"
              },
              "fields": []
            },
            "data": {
              "values": []
            }
          }
        ],
        "refId": "A"
      }
    }
  }
}
  1. I have doubles in my Slow Service instance dashboard image image

Query:

{
  "request": {
    "url": "api/ds/query?ds_type=prometheus&requestId=Q423",
    "method": "POST",
    "data": {
      "queries": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fdjzti6mhdam8c"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100",
          "format": "time_series",
          "instant": false,
          "legendFormat": "{{endpoint}}",
          "range": true,
          "refId": "A",
          "requestId": "71A",
          "utcOffsetSec": 10800,
          "interval": "",
          "datasourceId": 2,
          "intervalMs": 60000,
          "maxDataPoints": 940
        }
      ],
      "from": "1715666368835",
      "to": "1715669968835"
    },
    "hideFromInspector": false
  },
  "response": {
    "results": {
      "A": {
        "error": "expected object type",
        "errorSource": "",
        "status": 200,
        "frames": [
          {
            "schema": {
              "refId": "A",
              "meta": {
                "typeVersion": [
                  0,
                  0
                ],
                "executedQueryString": "Expr: endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100\nStep: 1m0s"
              },
              "fields": []
            },
            "data": {
              "values": []
            }
          }
        ],
        "refId": "A"
      }
    }
  }
}
  1. I have a lot in logs:
    WARNING: Exception processing message
    io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 5728590
        at io.grpc.Status.asRuntimeException(Status.java:525)
        at io.grpc.internal.MessageDeframer.processHeader(MessageDeframer.java:392)
        at io.grpc.internal.MessageDeframer.deliver(MessageDeframer.java:272)

    in docker-compose i have env SW_DCS_MAX_INBOUND_MESSAGE_SIZE when i run Skywalking:

    oap-bdb:
    <<: *oap-base
    profiles:
      - "banyandb"
    container_name: skywalking-server-bdb # rename to oap if switching to Elasticsearch
    depends_on:
      banyandb:
        condition: service_healthy
    environment:
      <<: *oap-env
      SW_STORAGE: banyandb
      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
      SW_CORE_RECORD_DATA_TTL: 14 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 14
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

When i use elastic - i have no doubles, i have all panels, and i still have a lot gRPC messages :)

I attach my docker log log.tar.gz

wu-sheng commented 6 months ago

If your data is just for testing, could you tar the whole data folder and upload here?

It would be eaiser to verified your query through the same dataset.

Almot77 commented 6 months ago

Its data from skywalking php exporter, i don`t now how to collect and save it, may be export docker container with collected data ?

wu-sheng commented 6 months ago

SW_DCS_MAX_INBOUND_MESSAGE_SIZE is not for this case. We need to check BanyanDB Java client(storage/banyandb/... in application.yml) for relative settings(maybe missed for now).

hanahmily commented 6 months ago
  banyandb:
    profiles:
      - "banyandb"
    image: ${BANYANDB_IMAGE:-apache/skywalking-banyandb:latest}
    container_name: banyandb
    restart: always
    networks:
      - skywalking
    expose:
      - 17912
    ports:
      - 17913:17913
    volumes:
      - <you host path>:/tmp

@Almot77 could you mount your host path to the banyandb's /tmp. then archive the whole path then upload here? The path should be like

image

Almot77 commented 6 months ago

Sure. Image with fix

version: '3.8'
services:
  banyandb:
    profiles:
      - "banyandb"
    image: ${BANYANDB_IMAGE:-ghcr.io/apache/skywalking-banyandb:c8270670d47a9c6caa2661af434157656c4b7eaf}
    container_name: banyandb
#    restart: always
    networks:
      - skywalking
    expose:
      - 17912
    ports:
      - 17913:17913
    volumes:
      - ./tmp:/tmp

banyandb container log: banyandb_container_log.tar.gz

tmp: https://filetransfer.io/data-package/POXU16no#link

Screens from grafana Elastic vs BanyanDB Elastic image

Banyan image image

hanahmily commented 5 months ago

@Almot77 Could you try ghcr.io/apache/skywalking-banyandb:4270ef1ff8adab3c5de68f9b5c467e838d8bc8ae which contains the patch raised by apache/skywalking-banyandb#447

Almot77 commented 5 months ago

After few minutes ater start i got:

  1. bdb and sw containers still woring, but traces stop collectiong.

bdb logs: image

  1. I still have service_name doubles on sw graphs.
  2. top_n still not working

queries: top_n(endpoint_sla,10,asc)/100 top_n(endpoint_resp_time,10,des) top_n(endpoint_cpm,10,des)

image

hanahmily commented 5 months ago

@Almot77 Thank you for your feedback.

Since there are several issues here, let's focus on the "errors" in bdb's log. I have created a debug image docker.io/hanahmily/skywalking-banyandb:13af6cb01078c29a3b346342b89ec56882466891, which can provide additional messages to help with this issue. Please collect all messages the bdb will print on the console.

If you're interested, please join our Slack channel at https://skywalking.apache.org/docs/main/next/en/guides/community/. This will help us communicate more efficiently.

Almot77 commented 5 months ago

Hi! I send request to join, thank you. Here logs: bdb.logs.tar.gz

wu-sheng commented 5 months ago

Please verify through https://github.com/apache/skywalking-banyandb/pkgs/container/skywalking-banyandb/220984457?tag=6b79695976e8a7531c70f785a100193b6863495d

Both bugs should be fixed.

Almot77 commented 5 months ago

Much better.

  1. No doubles in service instances
  2. Stable work, no errors image

But i still have'nt data for Endpoint Success Rate in Current Service (%) image

Query top_n(endpoint_sla,10,asc)/100 did Internal IO exception, query metrics error. Screen from SW: image

Same query from grafana: did expected object type image

PromQL: Query

Expr: endpoint_sla{parent_service='php-msk-prod', layer='GENERAL', top_n='15', order='ASC'} / 100
Step: 1m0s

Response:

{
  "request": {
    "url": "api/ds/query?ds_type=prometheus&requestId=Q640",
    "method": "POST",
    "data": {
      "queries": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fdjzti6mhdam8c"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "endpoint_sla{parent_service='php-***-prod', layer='GENERAL', top_n='15', order='ASC'} / 100",
          "format": "time_series",
          "instant": false,
          "legendFormat": "{{endpoint}}",
          "range": true,
          "refId": "A",
          "requestId": "71A",
          "utcOffsetSec": 10800,
          "interval": "",
          "datasourceId": 2,
          "intervalMs": 60000,
          "maxDataPoints": 407
        }
      ],
      "from": "1716747985680",
      "to": "1716751585680"
    },
    "hideFromInspector": false
  },
  "response": {
    "results": {
      "A": {
        "error": "**expected object type**",
        "errorSource": "",
        "status": 200,
        "frames": [
          {
            "schema": {
              "refId": "A",
              "meta": {
                "typeVersion": [
                  0,
                  0
                ],
                "executedQueryString": "Expr: endpoint_sla{parent_service='php-***-prod', layer='GENERAL', top_n='15', order='ASC'} / 100\nStep: 1m0s"
              },
              "fields": []
            },
            "data": {
              "values": []
            }
          }
        ],
        "refId": "A"
      }
    }
  }
}
wu-sheng commented 5 months ago

0.6.1 should have fixed all known issues.