SigNoz / signoz-otel-collector

SigNoz distro for OpenTelemetry Collector
45 stars 41 forks source link

Question: Is there any way to choose cluster name? #180

Closed alex1704 closed 1 day ago

alex1704 commented 1 year ago

From what I see in code cluster name is const == 'cluster'. Maybe there is some workaround?

srikanthccv commented 1 year ago

No, unfortunately, there is no way to configure this.

voriol commented 1 year ago

I am having the same problem, in my case, I am using a deployment with helm chart and an "external" clickhouse.

Versions:

signoz-otel-collector time="2023-10-03T11:43:52Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics ON CLUSTER cluster\n" component=clickhouse
signoz-otel-collector 2023/10/03 11:43:52 Error creating clickhouse client: code: 170, message: Requested cluster 'cluster' not found  
signoz-otel-collector-metrics time="2023-10-03T11:43:39Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics ON CLUSTER cluster\n" component=clickhouse
signoz-otel-collector-metrics 2023/10/03 11:43:39 Error creating clickhouse client: code: 170, message: Requested cluster 'cluster' not found   

The worst part is that apparently the documentation allows you to assign an external clickhouse:

https://github.com/SigNoz/charts/blob/main/charts/signoz/templates/_clickhouse.tpl#L12

Could someone help with this problem?

Thank you

jhotmann commented 1 year ago

Just got bit by this. I have an external, single-node clickhouse server that I was trying to connect to.

srikanthccv commented 1 year ago

We are working on making this configurable (with env).

srikanthccv commented 1 year ago

@dhawal1248 Just FYI, there are also instances of hardcoded cluster name for creating mat columns in query-service https://github.com/search?q=repo%3ASigNoz%2Fsignoz+%22on+cluster%22&type=code; when we roll this out, we should update query-service to support it as well.

voriol commented 1 year ago

Hi @prashant-shahi! Thanks for the fix, but in my case it doesn't quite work as I expected, I explain below:

If I've not understood it wrong, signoz starts two "otel collectors": signoz-otel-collector and signoz-otel-collector-metrics, the "metrics" collector works fine, but the other retunrs the following error:

signoz-otel-collector 2023-10-23T07:26:17.782Z    info    exporter@v0.79.0/exporter.go:275    Stability level of component is undefined    {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces"}      │
signoz-otel-collector {"level":"fatal","timestamp":"2023-10-23T07:26:17.789Z","caller":"signozcollector/main.go:78","msg":"failed to run service:","error":"failed to start collector service: failed to start : failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": error connecting to primary db: code: 170, message: Requested cluster 'cluster' not found","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozcollector/main.go:78\nruntime.main\n\t/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/proc.go:250"}  

The image is the "latest", with the tag, and the CLICKHOUSE CLUSTER environment variables defined:

Containers:
  mk8s1-tmb-signoz-otel-collector:
    Container ID:  containerd://db60f81f721c6f71d42753c3b6dbb516b2df2ff6c104d251695beaf8b1205057
    Image:         docker.io/signoz/signoz-otel-collector:0.79.10
    Image ID:      docker.io/signoz/signoz-otel-collector@sha256:2dc16f67e65ded72011848bf24b78c2ebd45eaa12c38d67c724bfd25e6a78d48

Environment:                                                                                                                                                     
  CLICKHOUSE_HOST:                  **********                                                                                                     
  CLICKHOUSE_PORT:                  **********
  CLICKHOUSE_HTTP_PORT:             **********
  CLICKHOUSE_CLUSTER:               default_cluster
  CLICKHOUSE_DATABASE:              signoz_metrics
  CLICKHOUSE_TRACE_DATABASE:        signoz_traces
  CLICKHOUSE_USER:                  **********
  CLICKHOUSE_PASSWORD:              <set to the key 'clickhouse-password' in secret 'clickhouse-password'>  Optional: false                                      
  CLICKHOUSE_SECURE:                false
  CLICKHOUSE_VERIFY:                false
  K8S_NODE_NAME:                     (v1:spec.nodeName)
  K8S_POD_IP:                        (v1:status.podIP)
  K8S_POD_NAME:                     signoz-otel-collector-7d998544b5-gld54 (v1:metadata.name)
  K8S_POD_UID:                       (v1:metadata.uid)
  K8S_NAMESPACE:                    tmb-signoz (v1:metadata.namespace)
  K8S_CLUSTER_NAME:                 k8s1
  SIGNOZ_COMPONENT:                 otel-collector
  OTEL_RESOURCE_ATTRIBUTES:         signoz.component=$(SIGNOZ_COMPONENT),k8s.cluster.name=$(K8S_CLUSTER_NAME),k8s.pod.uid=$(K8S_POD_UID),k8s.pod.ip=$(K8S_POD_IP)
  LOW_CARDINAL_EXCEPTION_GROUPING:  false  
prashant-shahi commented 1 year ago

@voriol This was mainly resolved by @dhawal1248.

Regarding the issue, can you try with latest helm chart with SigNoz OtelCollector v0.79.11.

voriol commented 1 year ago

OK @prashant-shahi, sorry for the confusion.

Regarding the issue, I am getting the same result as with the previous version:

Containers:
  signoz-otel-collector:
    Container ID:  containerd://e593544de02839042dacf7ed2317d0b0cc450666e6a6e88a6cc390b1fb316baf
    Image:         docker.io/signoz/signoz-otel-collector:0.79.11

[...]

    Environment:
      CLICKHOUSE_HOST:                  **********
      CLICKHOUSE_PORT:                  **********
      CLICKHOUSE_HTTP_PORT:             **********
      CLICKHOUSE_CLUSTER:               default_cluster
signoz-otel-collector {"level":"fatal","timestamp":"2023-10-25T06:23:10.695Z","caller":"signozcollector/main.go:78","msg":"failed to run service:","error":"failed to start collector service: failed to start : failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": error connecting to primary db: code: 170, message: Requested cluster 'cluster' not found","stacktrace":"main.main\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/cmd/signozcollector/main.go:78\nruntime.main\n\t/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/proc.go:250"}

Reviewing the changes in the new version we see that it refers to: update default retention, which apparently has nothing to do with the name of the clickhouse cluster.

@dhawal1248, would you be able to see what could be happening?

Thanks

bo8888 commented 5 months ago

the bug also ,why not fix it ?

srstrickland commented 3 months ago

Why was this issue closed? It is still not possible to launch signoz-otel-collector with an external clickhouse that has a cluster name other than 'cluster'.

My chart values (I realize this technically is for the charts repo, but the problem is in this component):

clickhouse:
  enabled: false

externalClickhouse:
  host: clickhouse-clickhouse.clickhouse-v2.svc.cluster.local
  cluster: default
  user: signoz
  existingSecret: clickhouse-auth
  existingSecretPasswordKey: password

k8s-infra:
  enabled: false

schemaMigrator:
  args:
    - --cluster-name=default

queryService:
  additionalArgs:
    - --cluster=default
  configVars:
    telemetryEnabled: false

signoz-otel-collector-metrics spins up fine. signoz-otel-collector emits the following error:

{
  "level": "error",
  "timestamp": "2024-07-22T17:02:07.717Z",
  "caller": "opamp/server_client.go:268",
  "msg": "Collector failed for restart during rollback",
  "component": "opamp-server-client",
  "error": "failed to build pipelines: failed to create \"clickhousetraces\" exporter for data type \"traces\": error connecting to primary db: code: 701, message: Requested cluster 'cluster' not found",
  "stacktrace": "github.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).reload\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:268\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*agentConfigManager).applyRemoteConfig\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/config_manager.go:173\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*agentConfigManager).Apply\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/config_manager.go:159\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onRemoteConfigHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:209\ngithub.com/SigNoz/signoz-otel-collector/opamp.(*serverClient).onMessageFuncHandler\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/opamp/server_client.go:199\ngithub.com/open-telemetry/opamp-go/client/types.CallbacksStruct.OnMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/types/callbacks.go:162\ngithub.com/open-telemetry/opamp-go/client/internal.(*receivedProcessor).ProcessReceivedMessage\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/receivedprocessor.go:131\ngithub.com/open-telemetry/opamp-go/client/internal.(*wsReceiver).ReceiverLoop\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/wsreceiver.go:57\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:243\ngithub.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/wsclient.go:265\ngithub.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1\n\t/home/runner/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.5.0/client/internal/clientcommon.go:197"
}

Some relevant information from deployment:

Containers:
 signoz-otel-collector:
  Image:       dockerhub.docker.zooxlabs.com/signoz/signoz-otel-collector:0.102.2

  Command:
    /signoz-collector

  Args:
    --config=/conf/otel-collector-config.yaml
    --manager-config=/conf/otel-collector-opamp-config.yaml
    --copy-path=/var/tmp/collector-config.yaml
    --feature-gates=-pkg.translator.prometheus.NormalizeName

  Environment:
    CLICKHOUSE_HOST:                  clickhouse-clickhouse.clickhouse-v2.svc.cluster.local
    CLICKHOUSE_PORT:                  9000
    CLICKHOUSE_HTTP_PORT:             8123
    CLICKHOUSE_CLUSTER:               default
    CLICKHOUSE_DATABASE:              signoz_metrics
    CLICKHOUSE_TRACE_DATABASE:        signoz_traces
    CLICKHOUSE_LOG_DATABASE:          signoz_logs
    CLICKHOUSE_USER:                  signoz
    CLICKHOUSE_PASSWORD:              <set to the key 'password' in secret 'clickhouse-auth'>  Optional: false
    CLICKHOUSE_SECURE:                false
    CLICKHOUSE_VERIFY:                false
    SIGNOZ_COMPONENT:                 otel-collector
    OTEL_RESOURCE_ATTRIBUTES:         signoz.component=$(SIGNOZ_COMPONENT),k8s.cluster.name=$(K8S_CLUSTER_NAME),k8s.pod.uid=$(K8S_POD_UID),k8s.pod.ip=$(K8S_POD_IP)
    LOW_CARDINAL_EXCEPTION_GROUPING:  false

Some occurrences of hard-coded 'cluster' (I don't have full context on these, just started poking around):

A query in the metrics exporter. This is not the component that is erroring out, but this just seems like a potential issue.

A default setting for traces exporter -- is this externally modifiable? I'm guessing it's this string that's the source of the error, which originates from the traces exporter.

A constant which doesn't appear to be used..

If there's something I can do to work around this, via some setting or env var, I'm happy to make those changes. But it looks to me like this was never fully resolved.

@voriol were you able to find a workaround? It seems I'm facing the same issue as you.

prashant-shahi commented 3 months ago

@srikanthccv can you please look into this, if the issue still persists?

srstrickland commented 2 months ago

Any update here? I see also this issue, which was closed with the comment that setting cluster name is supported, but it is not for the signoz-otel-collector, and there's no workaround that I know of (short of compiling my own).

srikanthccv commented 1 day ago

Fixed in https://github.com/SigNoz/signoz-otel-collector/commit/99de6779146dc3e3edb8f846cc698d0db2c2edce