SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.14k stars 1.15k forks source link

"panic: runtime error: invalid memory address or nil pointer dereference",Issue at the time of migration from 8.1 to 9 #2704

Open resulraveendran opened 1 year ago

resulraveendran commented 1 year ago

Hi ,

I am reaching out regarding an issue I encountered while attempting to migrate Signoz from version 0.8.1 to 0.9.

During the migration process, I encountered an error in the otel-collector-metrics component, and I have been unable to identify the root cause. The specific error message I received is as follows:

 2023/05/17 06:59:14 proto: duplicate proto type registered: jaeger.api_v2.PostSpansRequest
2023/05/17 06:59:14 proto: duplicate proto type registered: jaeger.api_v2.PostSpansResponse
time="2023-05-17T06:59:14Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics\n" component=clickhouse
time="2023-05-17T06:59:14Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 (\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tvalue Float64 Codec(Gorilla, LZ4)\n\t\t)\n\t\tENGINE = MergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint, timestamp_ms)\n" component=clickhouse
time="2023-05-17T06:59:14Z" level=info msg="Executing:\nSET allow_experimental_object_type = 1\n" component=clickhouse
2023-05-17T06:59:14.546Z    info    builder/exporters_builder.go:255    Exporter was built. {"kind": "exporter", "name": "clickhousemetricswrite"}
2023-05-17T06:59:14.546Z    info    builder/pipelines_builder.go:223    Pipeline was built. {"name": "pipeline", "name": "metrics"}
2023-05-17T06:59:14.546Z    info    builder/receivers_builder.go:226    Receiver was built. {"kind": "receiver", "name": "prometheus", "datatype": "metrics"}
2023-05-17T06:59:14.546Z    info    builder/receivers_builder.go:226    Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "metrics"}
2023-05-17T06:59:14.546Z    info    service/service.go:82   Starting extensions...
2023-05-17T06:59:14.546Z    info    extensions/extensions.go:38 Extension is starting...    {"kind": "extension", "name": "health_check"}
2023-05-17T06:59:14.546Z    info    healthcheckextension/healthcheckextension.go:44 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"},"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-05-17T06:59:14.546Z    info    extensions/extensions.go:42 Extension started.  {"kind": "extension", "name": "health_check"}
2023-05-17T06:59:14.546Z    info    extensions/extensions.go:38 Extension is starting...    {"kind": "extension", "name": "zpages"}
2023-05-17T06:59:14.547Z    info    zpagesextension/zpagesextension.go:40   Register Host's zPages  {"kind": "extension", "name": "zpages"}
2023-05-17T06:59:14.547Z    info    zpagesextension/zpagesextension.go:53   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-05-17T06:59:14.547Z    info    extensions/extensions.go:42 Extension started.  {"kind": "extension", "name": "zpages"}
2023-05-17T06:59:14.547Z    info    service/service.go:87   Starting exporters...
2023-05-17T06:59:14.547Z    info    builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "clickhousemetricswrite"}
2023-05-17T06:59:14.547Z    info    builder/exporters_builder.go:48 Exporter started.   {"kind": "exporter", "name": "clickhousemetricswrite"}
2023-05-17T06:59:14.547Z    info    service/service.go:92   Starting processors...
2023-05-17T06:59:14.547Z    info    builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "metrics"}
2023-05-17T06:59:14.547Z    info    builder/pipelines_builder.go:65 Pipeline is started.    {"name": "pipeline", "name": "metrics"}
2023-05-17T06:59:14.547Z    info    service/service.go:97   Starting receivers...
2023-05-17T06:59:14.547Z    info    builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "prometheus"}
2023-05-17T06:59:14.548Z    info    builder/receivers_builder.go:73 Receiver started.   {"kind": "receiver", "name": "prometheus"}
2023-05-17T06:59:14.548Z    info    builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"}
2023-05-17T06:59:14.548Z    info    otlpreceiver/otlp.go:69 Starting GRPC server on endpoint 0.0.0.0:4317   {"kind": "receiver", "name": "otlp"}
2023-05-17T06:59:14.548Z    info    otlpreceiver/otlp.go:87 Starting HTTP server on endpoint 0.0.0.0:4318   {"kind": "receiver", "name": "otlp"}
2023-05-17T06:59:14.548Z    info    otlpreceiver/otlp.go:147    Setting up a second HTTP listener on legacy endpoint 0.0.0.0:55681  {"kind": "receiver", "name": "otlp"}
2023-05-17T06:59:14.548Z    info    otlpreceiver/otlp.go:87 Starting HTTP server on endpoint 0.0.0.0:55681  {"kind": "receiver", "name": "otlp"}
2023-05-17T06:59:14.548Z    info    builder/receivers_builder.go:73 Receiver started.   {"kind": "receiver", "name": "otlp"}
2023-05-17T06:59:14.548Z    info    healthcheck/handler.go:129  Health Check state change   {"kind": "extension", "name": "health_check", "status": "ready"}
2023-05-17T06:59:14.548Z    info    service/telemetry.go:95 Setting up own telemetry...
2023-05-17T06:59:14.550Z    info    service/telemetry.go:115    Serving Prometheus metrics  {"address": ":8888", "level": "basic", "service.instance.id": "98ef90a5-db01-40d0-a558-0b0a016949e5", "service.version": "latest"}
2023-05-17T06:59:14.550Z    info    service/collector.go:229    Starting otelcontribcol...  {"Version": "latest", "NumCPU": 16}
2023-05-17T06:59:14.550Z    info    service/collector.go:124    Everything is ready. Begin running and processing data.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1063552]

goroutine 107 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhousemetricsexporter.(*PrwExporter).export.func1()
    /src/exporter/clickhousemetricsexporter/exporter.go:279 +0xf2
created by github.com/open-telemetry/opentelemetry-collector-contrib/exporter/clickhousemetricsexporter.(*PrwExporter).export
    /src/exporter/clickhousemetricsexporter/exporter.go:275 +0x256

I have reviewed the documentation and migration guides provided by Signoz, but I couldn't find a solution to resolve this issue. I have also checked the configuration files and ensured that I have updated them correctly.

I am reaching out to you for assistance in resolving this problem. Could you please provide guidance or steps to troubleshoot and resolve the error

welcome[bot] commented 1 year ago

Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.

nityanandagohain commented 1 year ago

Please share your otel-collector-config.yaml file, it will help us to point at the issue.

srikanthccv commented 1 year ago

This is not related to migration. The collector has an issues connecting to the ClickHouse. Please make sure ClickHouse is running and accepts connections.

resulraveendran commented 1 year ago

Please share your otel-collector-config.yaml file, it will help us to point at the issue.

otel-collector-config.yaml

  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: "localhost:12345"
  otlp:
    protocols:
      grpc:
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - "https://devl.claim.landing.medigy.com"
            - "https://devl.medigy.com"
  jaeger:
    protocols:
      grpc:
      thrift_http:
  hostmetrics:
    collection_interval: 60s
    scrapers:
      cpu:
      load:
      memory:
      disk:
      filesystem:
      network:
processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  signozspanmetrics/prometheus:
    metrics_exporter: prometheus
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 10000
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
 # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
extensions:
  health_check: {}
  zpages: {}
exporters:
  clickhousetraces:
    datasource: tcp://clickhouse:9000/?database=signoz_traces
  clickhousemetricswrite:
    endpoint: tcp://clickhouse:9000/?database=signoz_metrics
    resource_to_telemetry_conversion:
      enabled: true
  prometheus:
    endpoint: "0.0.0.0:8889"
service:
  extensions: [health_check, zpages]
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [signozspanmetrics/prometheus, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [batch]
      exporters: [clickhousemetricswrite]
    metrics/spanmetrics:
      receivers: [otlp/spanmetrics]
      exporters: [prometheus]
resulraveendran commented 1 year ago

otel-collector-metrics-config.yaml

  otlp:
    protocols:
      grpc:
      http:

  # Data sources: metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: "otel-collector"
          scrape_interval: 60s
          static_configs:
            - targets: ["otel-collector:8889"]
processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
  # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
extensions:
  health_check: {}
  zpages: {}
exporters:
  clickhousemetricswrite:
    endpoint: tcp://clickhouse:9000/?database=signoz_metrics

service:
  extensions: [health_check, zpages]
  pipelines:
    metrics:
      receivers: [otlp, prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite]
resulraveendran commented 1 year ago

This is not related to migration. The collector has an issues connecting to the ClickHouse. Please make sure ClickHouse is running and accepts connections.

My connector otel-collector-metrics container is down

srikanthccv commented 1 year ago

Its' down because it is unable to connect to ClickHouse. Please make sure ClickHouse is running and accepts connections.

resulraveendran commented 1 year ago

my clickhouse is running and it shows this logs

Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
Processing configuration file '/etc/clickhouse-server/users.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.

/entrypoint.sh: running /docker-entrypoint-initdb.d/init-db.sql

Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
Processing configuration file '/etc/clickhouse-server/users.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
 docker ps -a | grep apm-infra-experimental-com-clickhouse-server
163ade33f236   yandex/clickhouse-server:21.12.3.32                              "/entrypoint.sh"          8 days ago      Up 8 days (healthy)         0.0.0.0:8123->8123/tcp, :::8123->8123/tcp, 9009/tcp, 0.0.0.0:9101->9000/tcp, :::9101->9000/tcp                                                                                                                                                                                                                                                                                                                       apm-infra-experimental-com-clickhouse-server