SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.92k stars 1.23k forks source link

use externalClickhouse doesn't work #1589

Closed yxiaoy6 closed 2 years ago

yxiaoy6 commented 2 years ago

Bug description

Deploying with Helm directly value.yaml:

...
clickhouse:
  enabled: false
...
externalClickhouse:
  host: 192.168.xx.xx
  cluster: ck_cluster 
  database: signoz_metrics
  traceDatabase: signoz_traces
  user: "xxx_rw"
  password: "xxx"
  existingSecret:
  existingSecretPasswordKey:
  secure: false
  verify: false
  httpPort: 8123
  tcpPort: 9000

Please describe.
query-service fail

2022-09-22T09:14:54.890Z    INFO   version/version.go:43   

SigNoz version   : v0.11.0
Commit SHA-1     : 73b00f4
Commit timestamp : 2022-08-24T13:32:19Z
Branch           : HEAD
Go version       : go1.17.13

For SigNoz Official Documentation,  visit https://signoz.io/docs
For SigNoz Community Slack,         visit http://signoz.io/slack
For discussions about SigNoz,       visit https://community.signoz.io

Licensed under the MIT License.
Copyright 2022 SigNoz

2022-09-22T09:14:54.890Z    WARN   query-service/main.go:61    No JWT secret key is specified.
main.main
    /go/src/github.com/signoz/signoz/pkg/query-service/main.go:61
runtime.main
    /usr/local/go/src/runtime/proc.go:255
2022-09-22T09:14:56.245Z    INFO   app/server.go:84    Using ClickHouse as datastore ...
2022-09-22T09:14:56.248Z    ERROR  clickhouseReader/reader.go:113  failed to initialize ClickHouse: error connecting to primary db: code: 516, message: xxx_rw: Authentication failed: password is incorrect or there is no user with such name
go.signoz.io/query-service/app/clickhouseReader.NewReader
    /go/src/github.com/signoz/signoz/pkg/query-service/app/clickhouseReader/reader.go:113
go.signoz.io/query-service/app.NewServer
    /go/src/github.com/signoz/signoz/pkg/query-service/app/server.go:85
main.main
    /go/src/github.com/signoz/signoz/pkg/query-service/main.go:66
runtime.main
    /usr/local/go/src/runtime/proc.go:255

I make sure the account password is correct and the signoz_log library has been automatically created in my clickhouse otel-collector:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x343dd12]

goroutine 175 [running]:
github.com/SigNoz/signoz-otel-collector/exporter/clickhousemetricsexporter.(*PrwExporter).export.func1()
    /src/exporter/clickhousemetricsexporter/exporter.go:280 +0xf2
created by github.com/SigNoz/signoz-otel-collector/exporter/clickhousemetricsexporter.(*PrwExporter).export
    /src/exporter/clickhousemetricsexporter/exporter.go:276 +0x256

If this affects the front-end, screenshots would be of great help.

Expected behavior

How to reproduce

  1. Modify value.yaml, update clickhouse enable to false, update externalClickhouse information

Version information

Additional context

Thank you for your bug report – we love squashing them!

welcome[bot] commented 2 years ago

Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.

prashant-shahi commented 2 years ago

@yxiaoy6 This likely happens for various reasons:

Could you please make sure the above are not the case here?

yxiaoy6 commented 2 years ago

@yxiaoy6 This likely happens for various reasons:

  • when user/password combination is not correct
  • user passed does not enough permissions to create dbs and tables
  • IP whitelisting not allowing the IPs which are trying to connect to ClickHouse

Could you please make sure the above are not the case here?

Use dry_run to generate a test run file, can see that query_service does not have the environment variable CLICKHOUSE_PASSWORD

spec:
  serviceName: signoz-prod-query-service
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: signoz
      app.kubernetes.io/instance: signoz-prod
      app.kubernetes.io/component: query-service
  template:
    metadata:
      annotations:
        checksum/config: 8e31e85c30508fec03799f93a7ae9159c3247f2d91d226defe4188079254a6a6
      labels:
        app.kubernetes.io/name: signoz
        app.kubernetes.io/instance: signoz-prod
        app.kubernetes.io/component: query-service
    spec:
      serviceAccountName: signoz-prod-query-service
      initContainers:
        - name: signoz-prod-query-service-init
          image: docker.io/busybox:1.35
          imagePullPolicy: IfNotPresent
          command:
            - sh
            - -c
      containers:
        - name: signoz-prod-query-service
          securityContext:
            {}
          image: docker.io/signoz/query-service:0.11.0
          imagePullPolicy: IfNotPresent
          args: ["-config=/root/config/prometheus.yml"]
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          env:
            - name: STORAGE
              value: clickhouse
            - name: ClickHouseUrl
              value: tcp://192.168.xx.xxx:9000?database=signoz_traces&username=xxx_rw&password=$(CLICKHOUSE_PASSWORD)
            - name: ALERTMANAGER_API_PREFIX
              value: http://signoz-prod-alertmanager:9093/api/
            - name: GODEBUG
              value: netdns=go
            - name: TELEMETRY_ENABLED
              value: "true"
            - name: DEPLOYMENT_TYPE
              value: kubernetes-helm
          livenessProbe:
            httpGet:
              path: /api/v1/version
              port: http
.......

After I manually assign the password, the query service can run, but the otel-collector still cannot get up

yxiaoy6 commented 2 years ago

otel-colletor log:

2022-09-26T02:19:03.239Z    info    service/telemetry.go:103    Setting up own telemetry...
2022-09-26T02:19:03.239Z    info    service/telemetry.go:138    Serving Prometheus metrics  {"address": "0.0.0.0:8888", "level": "basic"}
2022-09-26T02:19:03.242Z    info    clickhouselogsexporter/exporter.go:247  Running migrations from path:   {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "test": "/logsmigrations"}
2022-09-26T02:19:03.247Z    info    clickhouselogsexporter/exporter.go:261  Clickhouse Migrate finished {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter"}
time="2022-09-26T02:19:03Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics\n" component=clickhouse
time="2022-09-26T02:19:03Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 (\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tvalue Float64 Codec(Gorilla, LZ4)\n\t\t)\n\t\tENGINE = MergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint, timestamp_ms)\n" component=clickhouse
time="2022-09-26T02:19:03Z" level=info msg="Executing:\nSET allow_experimental_object_type = 1\n" component=clickhouse
2022-09-26T02:19:03.262Z    info    clickhousetracesexporter/clickhouse_factory.go:82   Running migrations from path:   {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "test": "/migrations"}
2022-09-26T02:19:03.269Z    info    clickhousetracesexporter/clickhouse_factory.go:94   Clickhouse Migrate finished {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "error": "no change"}
2022-09-26T02:19:03.269Z    info    signozspanmetricsprocessor/processor.go:104 Building signozspanmetricsprocessor {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.277Z    info    extensions/extensions.go:42 Starting extensions...
2022-09-26T02:19:03.277Z    info    extensions/extensions.go:45 Extension is starting...    {"kind": "extension", "name": "health_check"}
2022-09-26T02:19:03.277Z    info    healthcheckextension@v0.55.0/healthcheckextension.go:44 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"},"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2022-09-26T02:19:03.277Z    info    extensions/extensions.go:49 Extension started.  {"kind": "extension", "name": "health_check"}
2022-09-26T02:19:03.277Z    info    extensions/extensions.go:45 Extension is starting...    {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.277Z    info    zpagesextension/zpagesextension.go:64   Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.278Z    info    zpagesextension/zpagesextension.go:74   Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.278Z    info    zpagesextension/zpagesextension.go:86   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2022-09-26T02:19:03.278Z    info    extensions/extensions.go:49 Extension started.  {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:74   Starting exporters...
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:86   Starting processors...
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "traces"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "traces"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z    info    signozspanmetricsprocessor/processor.go:230 Starting signozspanmetricsprocessor {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z    info    signozspanmetricsprocessor/processor.go:250 Found exporter  {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces", "signozspanmetrics-exporter": "prometheus"}
2022-09-26T02:19:03.278Z    info    signozspanmetricsprocessor/processor.go:258 Started signozspanmetricsprocessor  {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "metrics"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "metrics"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "logs"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "logs"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z    info    internal/resourcedetection.go:136   began detecting resource information    {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z    info    internal/resourcedetection.go:150   detected resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic", "resource": {"host.name":"cn-shanghai.192.168.31.238","os.type":"linux"}}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:98   Starting receivers...
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs"}
2022-09-26T02:19:03.278Z    info    adapter/receiver.go:54  Starting stanza receiver    {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs"}
2022-09-26T02:19:03.278Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.278Z    info    otlpreceiver/otlp.go:70 Starting GRPC server on endpoint 0.0.0.0:4317   {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.278Z    info    otlpreceiver/otlp.go:88 Starting HTTP server on endpoint 0.0.0.0:4318   {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "prometheus", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "prometheus", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "otlp", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "otlp/spanmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    otlpreceiver/otlp.go:70 Starting GRPC server on endpoint localhost:12345    {"kind": "receiver", "name": "otlp/spanmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "otlp/spanmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "kubeletstats", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "kubeletstats", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-09-26T02:19:03.279Z    info    static/strategy_store.go:203    No sampling strategies provided or URL is unavailable, using defaults   {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-09-26T02:19:03.279Z    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-09-26T02:19:03.279Z    info    healthcheck/handler.go:129  Health Check state change   {"kind": "extension", "name": "health_check", "status": "ready"}
2022-09-26T02:19:03.279Z    info    service/collector.go:215    Starting signoz-otel-collector...   {"Version": "latest", "NumCPU": 8}
2022-09-26T02:19:03.279Z    info    service/collector.go:128    Everything is ready. Begin running and processing data.
2022-09-26T02:19:03.502Z    info    fileconsumer/file.go:178    Started watching file   {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs", "component": "fileconsumer", "path": "/var/log/pods/cattle-system_rancher-849fc8b4df-p4fmb_ccf7f014-bde3-4639-9745-764fe6d2f9fa/rancher/0.log"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x343dd12]

goroutine 170 [running]:
github.com/SigNoz/signoz-otel-collector/exporter/clickhousemetricsexporter.(*PrwExporter).export.func1()
    /src/exporter/clickhousemetricsexporter/exporter.go:280 +0xf2
created by github.com/SigNoz/signoz-otel-collector/exporter/clickhousemetricsexporter.(*PrwExporter).export
    /src/exporter/clickhousemetricsexporter/exporter.go:276 +0x25
prashant-shahi commented 2 years ago

@yxiaoy6 Only ClickHouseUrl is needed for query-service. From logs, it looks like clickhouse address is not right.

Can you please share the environments passed? You can redact sensitive information with * or x?

yxiaoy6 commented 2 years ago

@yxiaoy6 Only ClickHouseUrl is needed for query-service. From logs, it looks like clickhouse address is not right.

Can you please share the environments passed? You can redact sensitive information with * or x?

value.yaml:

---
global:
  image:
    registry: null
  storageClass: null

fullnameOverride: ""

clusterDomain: cluster.local
clickhouse:
  cloud: other

  zookeeper:
    enabled: true
    persistence:
      enabled: true
      existingClaim: ""
      storageClass: alicloud-disk-essd

      accessModes:
        - ReadWriteOnce

      size: 20Gi
      annotations: {}

  enabled: false

  namespace: ""
  nameOverride: ""
  fullnameOverride: ""

  cluster: cluster
  database: signoz_metrics
  traceDatabase: signoz_traces
  user: admin
  password: 27ff0399-0d3a-4bd8-919d-17c2181e6fb9

  image:
    registry: docker.io
    repository: clickhouse/clickhouse-server
    tag: 22.4.5-alpine
    pullPolicy: IfNotPresent

  service:
    annotations: {}
    type: ClusterIP
    httpPort: 8123
    tcpPort: 9000

  secure: false
  verify: false
  externalZookeeper: {}

  tolerations: []
  affinity: {}
  resources: {}
  securityContext:
    enabled: true
    runAsUser: 101
    runAsGroup: 101
    fsGroup: 101
  useNodeSelector: false

  allowedNetworkIps:
    - "10.0.0.0/8"
    - "100.64.0.0/10"
    - "172.16.0.0/12"
    - "192.0.0.0/24"
    - "198.18.0.0/15"
    - "192.168.0.0/16"

  persistence:
    enabled: true

    existingClaim: ""

    storageClass: null
    accessModes:
      - ReadWriteOnce

    size: 30Gi

  profiles: {}
  defaultProfiles:
    default/allow_experimental_window_functions: "1"
    default/allow_nondeterministic_mutations: "1"

  layout:
    shardsCount: 1
    replicasCount: 1

  settings:
    prometheus/endpoint: /metrics
    prometheus/port: 9363
    # prometheus/metrics: true
    # prometheus/events: true
    # prometheus/asynchronous_metrics: true

  defaultSettings:
    format_schema_path: /etc/clickhouse-server/config.d/

  podAnnotations:
    signoz.io/scrape: 'true'
    signoz.io/port: '9363'
    signoz.io/path: /metrics

  # Cold storage configuration
  coldStorage:
    enabled: false
    defaultKeepFreeSpaceBytes: "10485760"
    endpoint: https://<bucket-name>.s3.amazonaws.com/data/
    role:
      enabled: false
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::******:role/*****
    accessKey: <access_key_id>
    secretAccess: <secret_access_key>

  installCustomStorageClass: false

  clickhouseOperator:
    name: operator

    version: 0.19.1

    image:
      registry: docker.io
      repository: altinity/clickhouse-operator
      tag: 0.19.1
      pullPolicy: IfNotPresent

    serviceAccount:
      create: true
      annotations: {}
      name:

    podAnnotations:
      signoz.io/port: '8888'
      signoz.io/scrape: 'true'

    nodeSelector: {}

    metricsExporter:
      name: metrics-exporter

      service:
        annotations: {}
        type: ClusterIP
        port: 8888

      image:
        registry: docker.io
        repository: altinity/metrics-exporter
        tag: 0.19.1
        pullPolicy: IfNotPresent

externalClickhouse:
  host: 192.168.31.226
  cluster: cluster 
  database: signoz_metrics
  traceDatabase: signoz_traces
  user: "xxx_rw"
  password: "xxxxx"
  existingSecret:
  existingSecretPasswordKey:
  secure: false
  verify: false
  httpPort: 8123
  tcpPort: 9000

queryService:
  name: "query-service"
  replicaCount: 1
  image:
    registry: docker.io
    repository: signoz/query-service
    tag: 0.11.0
    pullPolicy: IfNotPresent
  imagePullSecrets: []
  serviceAccount:
    create: true
    annotations: {}
    name:
  initContainers:
    init:
      enabled: true
      image:
        registry: docker.io
        repository: busybox
        tag: 1.35
        pullPolicy: IfNotPresent
      command:
        delay: 5
        endpoint: /ping
        waitMessage: "waiting for clickhouseDB"
        doneMessage: "clickhouse ready, starting query service now"
  configVars:
    storage: clickhouse
    # clickHouseUrl: tcp://my-release-clickhouse:9000/?database=signoz_traces&username=clickhouse_operator&password=clickhouse_operator_password
    clickHouseUrl: tcp://192.168.31.226:9000/?database=signoz_traces&username=xxx_rw&password=xxxxx
    goDebug: netdns=go
    telemetryEnabled: true
    deploymentType: kubernetes-helm

  podSecurityContext: {}
    # fsGroup: 2000

  securityContext: {}

  # Query-Service service
  service:
    annotations: {}
    type: ClusterIP
    port: 8080
    internalPort: 8085

  ingress:
    enabled: true
    className: ""
    annotations: {}
    hosts:
      - host: signoz-query-service-prod.xxx.com
        paths:
          - path: /
            pathType: ImplementationSpecific
            port: 8080
    tls: []

  resources:
    requests:
      cpu: 200m
      memory: 300Mi
    limits:
      cpu: 750m
      memory: 1000Mi

  nodeSelector: {}

  tolerations: []

  affinity: {}

  persistence:
    enabled: true

    storageClass: null

    accessModes:
      - ReadWriteOnce

    size: 30Gi

# Default values for frontend
frontend:
  name: "frontend"
  replicaCount: 1

  image:
    registry: docker.io
    repository: signoz/frontend
    tag: 0.11.0
    pullPolicy: IfNotPresent
  imagePullSecrets: []
  serviceAccount:
    create: true
    annotations: {}
    name:

  initContainers:
    init:
      enabled: true
      image:
        registry: docker.io
        repository: busybox
        tag: 1.35
        pullPolicy: IfNotPresent
      command:
        delay: 5
        endpoint: /api/v1/version
        waitMessage: "waiting for query-service"
        doneMessage: "clickhouse ready, starting frontend now"
  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 11
    targetCPUUtilizationPercentage: 50
    targetMemoryUtilizationPercentage: 50
    behavior: {}

    autoscalingTemplate: []
    keda:
      enabled: false
      pollingInterval: "30"   # check 30sec periodically for metrics data
      cooldownPeriod: "300"   # once the load decreased, it will wait for 5 min and downscale
      minReplicaCount: "1"    # should be >= replicaCount specified in values.yaml
      maxReplicaCount: "5"
      triggers:
        - type: memory
          metadata:
            type: Utilization
            value: "80"   # hpa make sure average Utilization <=80 by adding new pods
        - type: cpu
          metadata:
            type: Utilization
            value: "80"   # hpa make sure average Utlization <=80 by adding new pods

  configVars: {}

  podSecurityContext: {}
    # fsGroup: 2000

  securityContext: {}

  # Frontend service
  service:
    annotations: {}
    type: ClusterIP
    port: 3301

  ingress:
    enabled: true
    className: ""
    annotations: {}
    hosts:
      - host: signoz-frontend-prod.xxx.com
        paths:
          - path: /
            pathType: ImplementationSpecific
            port: 3301
    tls: []

  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 200m
      memory: 200Mi

  nodeSelector: {}

  tolerations: []

  affinity: {}

alertmanager:
  name: "alertmanager"
  replicaCount: 1

  image:
    registry: docker.io
    repository: signoz/alertmanager
    pullPolicy: IfNotPresent
    # Overrides the image tag whose default is the chart appVersion.
    tag: 0.23.0-0.2

  command: []
  extraArgs: {}

  imagePullSecrets: []

  service:
    annotations: {}
    type: ClusterIP
    port: 9093
    nodePort: null

  serviceAccount:
    create: true
    annotations: {}
    name:

  initContainers:
    init:
      enabled: true
      image:
        registry: docker.io
        repository: busybox
        tag: 1.35
        pullPolicy: IfNotPresent
      command:
        delay: 5
        endpoint: /api/v1/version
        waitMessage: "waiting for query-service"
        doneMessage: "clickhouse ready, starting alertmanager now"

  podSecurityContext:
    fsGroup: 65534
  dnsConfig: {}
  securityContext:
    runAsUser: 65534
    runAsNonRoot: true
    runAsGroup: 65534

  additionalPeers: []

  livenessProbe:
    httpGet:
      path: /
      port: http

  readinessProbe:
    httpGet:
      path: /
      port: http

  ingress:
    enabled: true
    className: ""
    annotations: {}
    hosts:
      - host: signoz-alertmanager-prod.xxx.com
        paths:
          - path: /
            pathType: ImplementationSpecific
            port: 9093
    tls: []

  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 200m
      memory: 200Mi

  nodeSelector: {}

  tolerations: []

  affinity: {}

  statefulSet:
    annotations: {}

  podAnnotations: {}
  podLabels: {}

  podDisruptionBudget: {}
    # maxUnavailable: 1
    # minAvailable: 1

  persistence:
    enabled: true
    storageClass: null

    accessModes:
      - ReadWriteOnce

    size: 20Gi

  configmapReload:
    enabled: false
    name: configmap-reload

    image:
      repository: jimmidyson/configmap-reload
      tag: v0.5.0
      pullPolicy: IfNotPresent

    resources: {}

# Default values for OtelCollector
otelCollector:
  name: "otel-collector"
  image:
    registry: docker.io
    repository: signoz/signoz-otel-collector
    tag: 0.55.0
    pullPolicy: IfNotPresent
    #pullPolicy: Always
  imagePullSecrets: []

  # OtelCollector service
  service:
    annotations: {}
    type: ClusterIP

  serviceAccount:
    create: true
    annotations: {}
    name:

  annotations: {}
  podAnnotations:
    signoz.io/scrape: 'true'
    signoz.io/port: '8889'
    signoz.io/path: /metrics

  minReadySeconds: 5
  initContainers:
    init:
      enabled: true
      image:
        registry: docker.io
        repository: busybox
        tag: 1.35
        pullPolicy: IfNotPresent
      command:
        delay: 5
        endpoint: /ping
        waitMessage: "waiting for clickhouseDB"
        doneMessage: "clickhouse ready, starting otel collector now"

  # Configuration for ports
  ports:
    otlp:
      enabled: true
      containerPort: 4317
      servicePort: 4317
      hostPort: 4317
      protocol: TCP
    otlp-http:
      enabled: true
      containerPort: 4318
      servicePort: 4318
      hostPort: 4318
      protocol: TCP
    jaeger-compact:
      enabled: false
      containerPort: 6831
      servicePort: 6831
      hostPort: 6831
      protocol: UDP
    jaeger-thrift:
      enabled: true
      containerPort: 14268
      servicePort: 14268
      hostPort: 14268
      protocol: TCP
    jaeger-grpc:
      enabled: true
      containerPort: 14250
      servicePort: 14250
      hostPort: 14250
      protocol: TCP
    zipkin:
      enabled: false
      containerPort: 9411
      servicePort: 9411
      hostPort: 9411
      protocol: TCP
    prometheus-metrics:
      enabled: false
      containerPort: 8889
      servicePort: 8889
      hostPort: 8889
      protocol: TCP
    metrics:
      enabled: true
      containerPort: 8888
      servicePort: 8888
      hostPort: 8888
      protocol: TCP
    zpages:
      enabled: false
      containerPort: 55679
      servicePort: 55679
      hostPort: 55679
      protocol: TCP

  livenessProbe:
    enabled: false
    port: 13133
    path: /
    initialDelaySeconds: 5
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1
  readinessProbe:
    enabled: false
    port: 13133
    path: /
    initialDelaySeconds: 5
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1

  customLivenessProbe: {}
  customReadinessProbe: {}

  ingress:
    enabled: true
    className: ""
    annotations: {}
    hosts:
      - host: signoz-otelcollector-prod.xxx.com
        paths:
          - path: /
            pathType: ImplementationSpecific
            port: 4318
    # -- OtelCollector Ingress TLS
    tls: []

  # adjust the resource requests and limit as necessary
  resources:
    requests:
      cpu: 200m
      memory: 400Mi
    limits:
      cpu: 1000m
      memory: 2Gi

  nodeSelector: {}

  tolerations: []

  affinity: {}

  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 11
    targetCPUUtilizationPercentage: 50
    targetMemoryUtilizationPercentage: 50
    behavior: {}

    autoscalingTemplate: []
    keda:
      enabled: false
      pollingInterval: "30"   # check 30sec periodically for metrics data
      cooldownPeriod: "300"   # once the load decreased, it will wait for 5 min and downscale
      minReplicaCount: "1"    # should be >= replicaCount specified in values.yaml
      maxReplicaCount: "5"
      triggers:
        - type: memory
          metadata:
            type: Utilization
            value: "80"   # hpa make sure average Utilization <=80 by adding new pods
        - type: cpu
          metadata:
            type: Utilization
            value: "80"   # hpa make sure average Utlization <=80 by adding new pods

  config:
    receivers:
      otlp/spanmetrics:
        protocols:
          grpc:
            endpoint: localhost:12345
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
      jaeger:
        protocols:
          grpc:
            endpoint: 0.0.0.0:14250
          thrift_http:
            endpoint: 0.0.0.0:14268
          # Uncomment to enable thift_company receiver.
          # You will also have set set enable it in `otelCollector.ports
          # thrift_compact:
          #   endpoint: 0.0.0.0:6831
      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu: {}
          load: {}
          memory: {}
          disk: {}
          filesystem: {}
          network: {}
      kubeletstats:
        collection_interval: 60s
        auth_type: serviceAccount
        endpoint: ${K8S_NODE_NAME}:10250
        insecure_skip_verify: true
        metric_groups:
          - container
          - node
          - pod
          - volume
      filelog/k8s:
        include:
          - /var/log/pods/*/*/*.log
        exclude:
          # Exclude logs from all containers named otel-collector
          - /var/log/pods/*/otel-collector/*.log
        start_at: beginning
        include_file_path: true
        include_file_name: false
        operators:
        # Find out which format is used by kubernetes
        - type: router
          id: get-format
          routes:
            - output: parser-docker
              expr: 'body matches "^\\{"'
            - output: parser-crio
              expr: 'body matches "^[^ Z]+ "'
            - output: parser-containerd
              expr: 'body matches "^[^ Z]+Z"'
        # Parse CRI-O format
        - type: regex_parser
          id: parser-crio
          regex: '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout_type: gotime
            layout: '2006-01-02T15:04:05.000000000-07:00'
        # Parse CRI-Containerd format
        - type: regex_parser
          id: parser-containerd
          regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        # Parse Docker format
        - type: json_parser
          id: parser-docker
          output: extract_metadata_from_filepath
          timestamp:
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        # Extract metadata from file path
        - type: regex_parser
          id: extract_metadata_from_filepath
          regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
          parse_from: attributes["log.file.path"]
        # Rename attributes
        - type: move
          from: attributes.stream
          to: attributes["log.iostream"]
        - type: move
          from: attributes.container_name
          to: attributes["k8s.container.name"]
        - type: move
          from: attributes.namespace
          to: attributes["k8s.namespace.name"]
        - type: move
          from: attributes.pod_name
          to: attributes["k8s.pod.name"]
        - type: move
          from: attributes.restart_count
          to: attributes["k8s.container.restart_count"]
        - type: move
          from: attributes.uid
          to: attributes["k8s.pod.uid"]
        - type: move
          from: attributes.log
          to: body
      prometheus:
        config:
          global:
            scrape_interval: 30s
          scrape_configs:
            - job_name: otel-collector
              static_configs:
              - targets:
                - ${HOST_IP}:8888
    processors:
      batch:
        send_batch_size: 1000
        timeout: 10s
      # Ref: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md
      resourcedetection:
        detectors: [env, system]  # Include ec2/eks for AWS, gce/gke for GCP and azure/aks for Azure
        # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels
        timeout: 2s
        override: false
        system:
          hostname_sources: [os]  # Alternatively, use [dns,os] for setting FQDN as host.name and os as fallback
      signozspanmetrics/prometheus:
        metrics_exporter: prometheus
        latency_histogram_buckets:
          [
            100us,
            1ms,
            2ms,
            6ms,
            10ms,
            50ms,
            100ms,
            250ms,
            500ms,
            1000ms,
            1400ms,
            2000ms,
            5s,
            10s,
            20s,
            40s,
            60s,
          ]
        dimensions_cache_size: 10000
        dimensions:
          - name: service.namespace
            default: default
          - name: deployment.environment
            default: default
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      zpages:
        endpoint: localhost:55679
      pprof:
        endpoint: localhost:1777
    exporters:
      clickhousetraces:
        datasource: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?database=${CLICKHOUSE_TRACE_DATABASE}&username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
      clickhousemetricswrite:
        endpoint: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?database=${CLICKHOUSE_DATABASE}&username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
        resource_to_telemetry_conversion:
          enabled: true
      clickhouselogsexporter:
        dsn: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
        timeout: 10s
        sending_queue:
          queue_size: 100
        retry_on_failure:
          enabled: true
          initial_interval: 5s
          max_interval: 30s
          max_elapsed_time: 300s
      prometheus:
        endpoint: 0.0.0.0:8889
    service:
      telemetry:
        metrics:
          address: 0.0.0.0:8888
      extensions: [health_check, zpages]
      pipelines:
        traces:
          receivers: [jaeger, otlp]
          processors: [signozspanmetrics/prometheus, batch]
          exporters: [clickhousetraces]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [clickhousemetricswrite]
        metrics/generic:
          receivers: [hostmetrics, kubeletstats, prometheus]
          processors: [resourcedetection, batch]
          exporters: [clickhousemetricswrite]
        metrics/spanmetrics:
          receivers: [otlp/spanmetrics]
          exporters: [prometheus]
        logs:
          receivers: [filelog/k8s, otlp]
          processors: [batch]
          exporters: [clickhouselogsexporter]

# Default values for OtelCollectorMetrics
otelCollectorMetrics:
  name: "otel-collector-metrics"
  image:
    registry: docker.io
    repository: signoz/signoz-otel-collector
    tag: 0.55.0
    pullPolicy: IfNotPresent
    #pullPolicy: Always
  imagePullSecrets: []

  # OtelCollectorMetrics service
  service:
    annotations: {}
    type: ClusterIP

  serviceAccount:
    create: true
    annotations: {}
    name:

  annotations: {}
  podAnnotations:
    signoz.io/scrape: 'true'
    signoz.io/port: '8888'
    signoz.io/path: /metrics

  minReadySeconds: 5
  progressDeadlineSeconds: 120
  replicaCount: 1
  initContainers:
    init:
      enabled: true
      image:
        registry: docker.io
        repository: busybox
        tag: 1.35
        pullPolicy: IfNotPresent
      command:
        delay: 5
        endpoint: /ping
        waitMessage: "waiting for clickhouseDB"
        doneMessage: "clickhouse ready, starting otel collector metrics now"

  # Configuration for ports
  ports:
    metrics:
      enabled: false
      containerPort: 8888
      servicePort: 8888
      protocol: TCP
    zpages:
      enabled: false
      containerPort: 55679
      servicePort: 55679
      protocol: TCP
    health-check:
      enabled: true
      containerPort: 13133
      servicePort: 13133
      protocol: TCP
    pprof:
      enabled: false
      containerPort: 1777
      servicePort: 1777
      protocol: TCP

  livenessProbe:
    enabled: false
    port: 13133
    path: /
    initialDelaySeconds: 5
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1
  readinessProbe:
    enabled: false
    port: 13133
    path: /
    initialDelaySeconds: 5
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 6
    successThreshold: 1

  ## Custom liveness and readiness probes
  customLivenessProbe: {}
  customReadinessProbe: {}

  ingress:
    enabled: true
    className: ""
    annotations: {}

    hosts:
      - host: signoz-otelcollector-metrics-prod.xxx.com
        paths:
          - path: /
            pathType: ImplementationSpecific
            port: 13133
    tls: []

  # adjust the resource requests and limit as necessary
  resources:
    requests:
      cpu: 200m
      memory: 400Mi
    limits:
      cpu: 1000m
      memory: 2Gi

  nodeSelector: {}

  tolerations: []

  affinity: {}

  config:
    receivers:
      k8s_cluster:
        collection_interval: 60s
        node_conditions_to_report: [Ready, MemoryPressure]
      # Data sources: metrics
      prometheus:
        config:
          scrape_configs:
            # otel-collector-metrics internal metrics
            - job_name: "otel-collector-metrics"
              scrape_interval: 60s
              static_configs:
                - targets:
                  - ${MY_POD_IP}:8888
            # generic prometheus metrics scraper (scrapped when pod annotations are set)
            - job_name: "generic-collector"
              scrape_interval: 60s
              kubernetes_sd_configs:
                - role: pod
              relabel_configs:
                - source_labels:
                    [__meta_kubernetes_pod_annotation_signoz_io_scrape]
                  action: keep
                  regex: true
                - source_labels:
                    [__meta_kubernetes_pod_annotation_signoz_io_path]
                  action: replace
                  target_label: __metrics_path__
                  regex: (.+)
                - source_labels:
                    [
                      __meta_kubernetes_pod_ip,
                      __meta_kubernetes_pod_annotation_signoz_io_port,
                    ]
                  action: replace
                  separator: ":"
                  target_label: __address__
                - action: labelmap
                  regex: __meta_kubernetes_pod_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: k8s_namespace
                - source_labels: [__meta_kubernetes_pod_name]
                  action: replace
                  target_label: k8s_pod
    processors:
      batch:
        send_batch_size: 1000
        timeout: 10s
      # -- Memory Limiter processor
      # If set to null, will be overridden with values based on k8s resource limits.
      memory_limiter: null
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      zpages:
        endpoint: localhost:55679
      pprof:
        endpoint: localhost:1777
    exporters:
      clickhousemetricswrite:
        endpoint: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?database=${CLICKHOUSE_DATABASE}&username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
    service:
      telemetry:
        metrics:
          address: 0.0.0.0:8888
      extensions: [health_check, zpages, pprof]
      pipelines:
        metrics:
          receivers: [k8s_cluster, prometheus]
          processors: [batch]
          exporters: [clickhousemetricswrite]
yxiaoy6 commented 2 years ago

@yxiaoy6 Only ClickHouseUrl is needed for query-service. From logs, it looks like clickhouse address is not right.

Can you please share the environments passed? You can redact sensitive information with * or x?

otel-collector always restart image

yxiaoy6 commented 2 years ago

@yxiaoy6 Only ClickHouseUrl is needed for query-service. From logs, it looks like clickhouse address is not right. Can you please share the environments passed? You can redact sensitive information with * or x?

otel-collector always restart image

@prashant-shahi Looking forward for your response, thanks!

yxiaoy6 commented 2 years ago

It should be because the clickhouse version is too low

prashant-shahi commented 2 years ago

@yxiaoy6 Thank you for being patient and responding with the solution to the reported issue.

When using external clickhouse, make sure: