Closed yxiaoy6 closed 2 years ago
Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.
@yxiaoy6 This likely happens for various reasons:
Could you please make sure the above are not the case here?
@yxiaoy6 This likely happens for various reasons:
- when user/password combination is not correct
- user passed does not enough permissions to create dbs and tables
- IP whitelisting not allowing the IPs which are trying to connect to ClickHouse
Could you please make sure the above are not the case here?
Use dry_run to generate a test run file, can see that query_service does not have the environment variable CLICKHOUSE_PASSWORD
spec:
serviceName: signoz-prod-query-service
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: signoz
app.kubernetes.io/instance: signoz-prod
app.kubernetes.io/component: query-service
template:
metadata:
annotations:
checksum/config: 8e31e85c30508fec03799f93a7ae9159c3247f2d91d226defe4188079254a6a6
labels:
app.kubernetes.io/name: signoz
app.kubernetes.io/instance: signoz-prod
app.kubernetes.io/component: query-service
spec:
serviceAccountName: signoz-prod-query-service
initContainers:
- name: signoz-prod-query-service-init
image: docker.io/busybox:1.35
imagePullPolicy: IfNotPresent
command:
- sh
- -c
containers:
- name: signoz-prod-query-service
securityContext:
{}
image: docker.io/signoz/query-service:0.11.0
imagePullPolicy: IfNotPresent
args: ["-config=/root/config/prometheus.yml"]
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: STORAGE
value: clickhouse
- name: ClickHouseUrl
value: tcp://192.168.xx.xxx:9000?database=signoz_traces&username=xxx_rw&password=$(CLICKHOUSE_PASSWORD)
- name: ALERTMANAGER_API_PREFIX
value: http://signoz-prod-alertmanager:9093/api/
- name: GODEBUG
value: netdns=go
- name: TELEMETRY_ENABLED
value: "true"
- name: DEPLOYMENT_TYPE
value: kubernetes-helm
livenessProbe:
httpGet:
path: /api/v1/version
port: http
.......
After I manually assign the password, the query service can run, but the otel-collector still cannot get up
otel-colletor log:
2022-09-26T02:19:03.239Z info service/telemetry.go:103 Setting up own telemetry...
2022-09-26T02:19:03.239Z info service/telemetry.go:138 Serving Prometheus metrics {"address": "0.0.0.0:8888", "level": "basic"}
2022-09-26T02:19:03.242Z info clickhouselogsexporter/exporter.go:247 Running migrations from path: {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter", "test": "/logsmigrations"}
2022-09-26T02:19:03.247Z info clickhouselogsexporter/exporter.go:261 Clickhouse Migrate finished {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter"}
time="2022-09-26T02:19:03Z" level=info msg="Executing:\nCREATE DATABASE IF NOT EXISTS signoz_metrics\n" component=clickhouse
time="2022-09-26T02:19:03Z" level=info msg="Executing:\nCREATE TABLE IF NOT EXISTS signoz_metrics.samples_v2 (\n\t\t\tmetric_name LowCardinality(String),\n\t\t\tfingerprint UInt64 Codec(DoubleDelta, LZ4),\n\t\t\ttimestamp_ms Int64 Codec(DoubleDelta, LZ4),\n\t\t\tvalue Float64 Codec(Gorilla, LZ4)\n\t\t)\n\t\tENGINE = MergeTree\n\t\t\tPARTITION BY toDate(timestamp_ms / 1000)\n\t\t\tORDER BY (metric_name, fingerprint, timestamp_ms)\n" component=clickhouse
time="2022-09-26T02:19:03Z" level=info msg="Executing:\nSET allow_experimental_object_type = 1\n" component=clickhouse
2022-09-26T02:19:03.262Z info clickhousetracesexporter/clickhouse_factory.go:82 Running migrations from path: {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "test": "/migrations"}
2022-09-26T02:19:03.269Z info clickhousetracesexporter/clickhouse_factory.go:94 Clickhouse Migrate finished {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces", "error": "no change"}
2022-09-26T02:19:03.269Z info signozspanmetricsprocessor/processor.go:104 Building signozspanmetricsprocessor {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.277Z info extensions/extensions.go:42 Starting extensions...
2022-09-26T02:19:03.277Z info extensions/extensions.go:45 Extension is starting... {"kind": "extension", "name": "health_check"}
2022-09-26T02:19:03.277Z info healthcheckextension@v0.55.0/healthcheckextension.go:44 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Port":0,"TCPAddr":{"Endpoint":"0.0.0.0:13133"},"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2022-09-26T02:19:03.277Z info extensions/extensions.go:49 Extension started. {"kind": "extension", "name": "health_check"}
2022-09-26T02:19:03.277Z info extensions/extensions.go:45 Extension is starting... {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.277Z info zpagesextension/zpagesextension.go:64 Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.278Z info zpagesextension/zpagesextension.go:74 Registered Host's zPages {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.278Z info zpagesextension/zpagesextension.go:86 Starting zPages extension {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2022-09-26T02:19:03.278Z info extensions/extensions.go:49 Extension started. {"kind": "extension", "name": "zpages"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:74 Starting exporters...
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "metrics", "name": "clickhousemetricswrite"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "logs", "name": "clickhouselogsexporter"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:78 Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:82 Exporter started. {"kind": "exporter", "data_type": "traces", "name": "clickhousetraces"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:86 Starting processors...
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:90 Processor is starting... {"kind": "processor", "name": "batch", "pipeline": "traces"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:94 Processor started. {"kind": "processor", "name": "batch", "pipeline": "traces"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:90 Processor is starting... {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z info signozspanmetricsprocessor/processor.go:230 Starting signozspanmetricsprocessor {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z info signozspanmetricsprocessor/processor.go:250 Found exporter {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces", "signozspanmetrics-exporter": "prometheus"}
2022-09-26T02:19:03.278Z info signozspanmetricsprocessor/processor.go:258 Started signozspanmetricsprocessor {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:94 Processor started. {"kind": "processor", "name": "signozspanmetrics/prometheus", "pipeline": "traces"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:90 Processor is starting... {"kind": "processor", "name": "batch", "pipeline": "metrics"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:94 Processor started. {"kind": "processor", "name": "batch", "pipeline": "metrics"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:90 Processor is starting... {"kind": "processor", "name": "batch", "pipeline": "logs"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:94 Processor started. {"kind": "processor", "name": "batch", "pipeline": "logs"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:90 Processor is starting... {"kind": "processor", "name": "batch", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:94 Processor started. {"kind": "processor", "name": "batch", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:90 Processor is starting... {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z info internal/resourcedetection.go:136 began detecting resource information {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z info internal/resourcedetection.go:150 detected resource information {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic", "resource": {"host.name":"cn-shanghai.192.168.31.238","os.type":"linux"}}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:94 Processor started. {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/generic"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:98 Starting receivers...
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs"}
2022-09-26T02:19:03.278Z info adapter/receiver.go:54 Starting stanza receiver {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs"}
2022-09-26T02:19:03.278Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.278Z info otlpreceiver/otlp.go:70 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.278Z info otlpreceiver/otlp.go:88 Starting HTTP server on endpoint 0.0.0.0:4318 {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "prometheus", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "prometheus", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "otlp", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "otlp/spanmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info otlpreceiver/otlp.go:70 Starting GRPC server on endpoint localhost:12345 {"kind": "receiver", "name": "otlp/spanmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "otlp/spanmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "kubeletstats", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "kubeletstats", "pipeline": "metrics"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-09-26T02:19:03.279Z info static/strategy_store.go:203 No sampling strategies provided or URL is unavailable, using defaults {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "jaeger", "pipeline": "traces"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:102 Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-09-26T02:19:03.279Z info pipelines/pipelines.go:106 Receiver started. {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-09-26T02:19:03.279Z info healthcheck/handler.go:129 Health Check state change {"kind": "extension", "name": "health_check", "status": "ready"}
2022-09-26T02:19:03.279Z info service/collector.go:215 Starting signoz-otel-collector... {"Version": "latest", "NumCPU": 8}
2022-09-26T02:19:03.279Z info service/collector.go:128 Everything is ready. Begin running and processing data.
2022-09-26T02:19:03.502Z info fileconsumer/file.go:178 Started watching file {"kind": "receiver", "name": "filelog/k8s", "pipeline": "logs", "component": "fileconsumer", "path": "/var/log/pods/cattle-system_rancher-849fc8b4df-p4fmb_ccf7f014-bde3-4639-9745-764fe6d2f9fa/rancher/0.log"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x343dd12]
goroutine 170 [running]:
github.com/SigNoz/signoz-otel-collector/exporter/clickhousemetricsexporter.(*PrwExporter).export.func1()
/src/exporter/clickhousemetricsexporter/exporter.go:280 +0xf2
created by github.com/SigNoz/signoz-otel-collector/exporter/clickhousemetricsexporter.(*PrwExporter).export
/src/exporter/clickhousemetricsexporter/exporter.go:276 +0x25
@yxiaoy6 Only ClickHouseUrl
is needed for query-service
.
From logs, it looks like clickhouse address is not right.
Can you please share the environments passed? You can redact sensitive information with *
or x
?
@yxiaoy6 Only
ClickHouseUrl
is needed forquery-service
. From logs, it looks like clickhouse address is not right.Can you please share the environments passed? You can redact sensitive information with
*
orx
?
value.yaml:
---
global:
image:
registry: null
storageClass: null
fullnameOverride: ""
clusterDomain: cluster.local
clickhouse:
cloud: other
zookeeper:
enabled: true
persistence:
enabled: true
existingClaim: ""
storageClass: alicloud-disk-essd
accessModes:
- ReadWriteOnce
size: 20Gi
annotations: {}
enabled: false
namespace: ""
nameOverride: ""
fullnameOverride: ""
cluster: cluster
database: signoz_metrics
traceDatabase: signoz_traces
user: admin
password: 27ff0399-0d3a-4bd8-919d-17c2181e6fb9
image:
registry: docker.io
repository: clickhouse/clickhouse-server
tag: 22.4.5-alpine
pullPolicy: IfNotPresent
service:
annotations: {}
type: ClusterIP
httpPort: 8123
tcpPort: 9000
secure: false
verify: false
externalZookeeper: {}
tolerations: []
affinity: {}
resources: {}
securityContext:
enabled: true
runAsUser: 101
runAsGroup: 101
fsGroup: 101
useNodeSelector: false
allowedNetworkIps:
- "10.0.0.0/8"
- "100.64.0.0/10"
- "172.16.0.0/12"
- "192.0.0.0/24"
- "198.18.0.0/15"
- "192.168.0.0/16"
persistence:
enabled: true
existingClaim: ""
storageClass: null
accessModes:
- ReadWriteOnce
size: 30Gi
profiles: {}
defaultProfiles:
default/allow_experimental_window_functions: "1"
default/allow_nondeterministic_mutations: "1"
layout:
shardsCount: 1
replicasCount: 1
settings:
prometheus/endpoint: /metrics
prometheus/port: 9363
# prometheus/metrics: true
# prometheus/events: true
# prometheus/asynchronous_metrics: true
defaultSettings:
format_schema_path: /etc/clickhouse-server/config.d/
podAnnotations:
signoz.io/scrape: 'true'
signoz.io/port: '9363'
signoz.io/path: /metrics
# Cold storage configuration
coldStorage:
enabled: false
defaultKeepFreeSpaceBytes: "10485760"
endpoint: https://<bucket-name>.s3.amazonaws.com/data/
role:
enabled: false
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::******:role/*****
accessKey: <access_key_id>
secretAccess: <secret_access_key>
installCustomStorageClass: false
clickhouseOperator:
name: operator
version: 0.19.1
image:
registry: docker.io
repository: altinity/clickhouse-operator
tag: 0.19.1
pullPolicy: IfNotPresent
serviceAccount:
create: true
annotations: {}
name:
podAnnotations:
signoz.io/port: '8888'
signoz.io/scrape: 'true'
nodeSelector: {}
metricsExporter:
name: metrics-exporter
service:
annotations: {}
type: ClusterIP
port: 8888
image:
registry: docker.io
repository: altinity/metrics-exporter
tag: 0.19.1
pullPolicy: IfNotPresent
externalClickhouse:
host: 192.168.31.226
cluster: cluster
database: signoz_metrics
traceDatabase: signoz_traces
user: "xxx_rw"
password: "xxxxx"
existingSecret:
existingSecretPasswordKey:
secure: false
verify: false
httpPort: 8123
tcpPort: 9000
queryService:
name: "query-service"
replicaCount: 1
image:
registry: docker.io
repository: signoz/query-service
tag: 0.11.0
pullPolicy: IfNotPresent
imagePullSecrets: []
serviceAccount:
create: true
annotations: {}
name:
initContainers:
init:
enabled: true
image:
registry: docker.io
repository: busybox
tag: 1.35
pullPolicy: IfNotPresent
command:
delay: 5
endpoint: /ping
waitMessage: "waiting for clickhouseDB"
doneMessage: "clickhouse ready, starting query service now"
configVars:
storage: clickhouse
# clickHouseUrl: tcp://my-release-clickhouse:9000/?database=signoz_traces&username=clickhouse_operator&password=clickhouse_operator_password
clickHouseUrl: tcp://192.168.31.226:9000/?database=signoz_traces&username=xxx_rw&password=xxxxx
goDebug: netdns=go
telemetryEnabled: true
deploymentType: kubernetes-helm
podSecurityContext: {}
# fsGroup: 2000
securityContext: {}
# Query-Service service
service:
annotations: {}
type: ClusterIP
port: 8080
internalPort: 8085
ingress:
enabled: true
className: ""
annotations: {}
hosts:
- host: signoz-query-service-prod.xxx.com
paths:
- path: /
pathType: ImplementationSpecific
port: 8080
tls: []
resources:
requests:
cpu: 200m
memory: 300Mi
limits:
cpu: 750m
memory: 1000Mi
nodeSelector: {}
tolerations: []
affinity: {}
persistence:
enabled: true
storageClass: null
accessModes:
- ReadWriteOnce
size: 30Gi
# Default values for frontend
frontend:
name: "frontend"
replicaCount: 1
image:
registry: docker.io
repository: signoz/frontend
tag: 0.11.0
pullPolicy: IfNotPresent
imagePullSecrets: []
serviceAccount:
create: true
annotations: {}
name:
initContainers:
init:
enabled: true
image:
registry: docker.io
repository: busybox
tag: 1.35
pullPolicy: IfNotPresent
command:
delay: 5
endpoint: /api/v1/version
waitMessage: "waiting for query-service"
doneMessage: "clickhouse ready, starting frontend now"
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 11
targetCPUUtilizationPercentage: 50
targetMemoryUtilizationPercentage: 50
behavior: {}
autoscalingTemplate: []
keda:
enabled: false
pollingInterval: "30" # check 30sec periodically for metrics data
cooldownPeriod: "300" # once the load decreased, it will wait for 5 min and downscale
minReplicaCount: "1" # should be >= replicaCount specified in values.yaml
maxReplicaCount: "5"
triggers:
- type: memory
metadata:
type: Utilization
value: "80" # hpa make sure average Utilization <=80 by adding new pods
- type: cpu
metadata:
type: Utilization
value: "80" # hpa make sure average Utlization <=80 by adding new pods
configVars: {}
podSecurityContext: {}
# fsGroup: 2000
securityContext: {}
# Frontend service
service:
annotations: {}
type: ClusterIP
port: 3301
ingress:
enabled: true
className: ""
annotations: {}
hosts:
- host: signoz-frontend-prod.xxx.com
paths:
- path: /
pathType: ImplementationSpecific
port: 3301
tls: []
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 200m
memory: 200Mi
nodeSelector: {}
tolerations: []
affinity: {}
alertmanager:
name: "alertmanager"
replicaCount: 1
image:
registry: docker.io
repository: signoz/alertmanager
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: 0.23.0-0.2
command: []
extraArgs: {}
imagePullSecrets: []
service:
annotations: {}
type: ClusterIP
port: 9093
nodePort: null
serviceAccount:
create: true
annotations: {}
name:
initContainers:
init:
enabled: true
image:
registry: docker.io
repository: busybox
tag: 1.35
pullPolicy: IfNotPresent
command:
delay: 5
endpoint: /api/v1/version
waitMessage: "waiting for query-service"
doneMessage: "clickhouse ready, starting alertmanager now"
podSecurityContext:
fsGroup: 65534
dnsConfig: {}
securityContext:
runAsUser: 65534
runAsNonRoot: true
runAsGroup: 65534
additionalPeers: []
livenessProbe:
httpGet:
path: /
port: http
readinessProbe:
httpGet:
path: /
port: http
ingress:
enabled: true
className: ""
annotations: {}
hosts:
- host: signoz-alertmanager-prod.xxx.com
paths:
- path: /
pathType: ImplementationSpecific
port: 9093
tls: []
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 200m
memory: 200Mi
nodeSelector: {}
tolerations: []
affinity: {}
statefulSet:
annotations: {}
podAnnotations: {}
podLabels: {}
podDisruptionBudget: {}
# maxUnavailable: 1
# minAvailable: 1
persistence:
enabled: true
storageClass: null
accessModes:
- ReadWriteOnce
size: 20Gi
configmapReload:
enabled: false
name: configmap-reload
image:
repository: jimmidyson/configmap-reload
tag: v0.5.0
pullPolicy: IfNotPresent
resources: {}
# Default values for OtelCollector
otelCollector:
name: "otel-collector"
image:
registry: docker.io
repository: signoz/signoz-otel-collector
tag: 0.55.0
pullPolicy: IfNotPresent
#pullPolicy: Always
imagePullSecrets: []
# OtelCollector service
service:
annotations: {}
type: ClusterIP
serviceAccount:
create: true
annotations: {}
name:
annotations: {}
podAnnotations:
signoz.io/scrape: 'true'
signoz.io/port: '8889'
signoz.io/path: /metrics
minReadySeconds: 5
initContainers:
init:
enabled: true
image:
registry: docker.io
repository: busybox
tag: 1.35
pullPolicy: IfNotPresent
command:
delay: 5
endpoint: /ping
waitMessage: "waiting for clickhouseDB"
doneMessage: "clickhouse ready, starting otel collector now"
# Configuration for ports
ports:
otlp:
enabled: true
containerPort: 4317
servicePort: 4317
hostPort: 4317
protocol: TCP
otlp-http:
enabled: true
containerPort: 4318
servicePort: 4318
hostPort: 4318
protocol: TCP
jaeger-compact:
enabled: false
containerPort: 6831
servicePort: 6831
hostPort: 6831
protocol: UDP
jaeger-thrift:
enabled: true
containerPort: 14268
servicePort: 14268
hostPort: 14268
protocol: TCP
jaeger-grpc:
enabled: true
containerPort: 14250
servicePort: 14250
hostPort: 14250
protocol: TCP
zipkin:
enabled: false
containerPort: 9411
servicePort: 9411
hostPort: 9411
protocol: TCP
prometheus-metrics:
enabled: false
containerPort: 8889
servicePort: 8889
hostPort: 8889
protocol: TCP
metrics:
enabled: true
containerPort: 8888
servicePort: 8888
hostPort: 8888
protocol: TCP
zpages:
enabled: false
containerPort: 55679
servicePort: 55679
hostPort: 55679
protocol: TCP
livenessProbe:
enabled: false
port: 13133
path: /
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
enabled: false
port: 13133
path: /
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
customLivenessProbe: {}
customReadinessProbe: {}
ingress:
enabled: true
className: ""
annotations: {}
hosts:
- host: signoz-otelcollector-prod.xxx.com
paths:
- path: /
pathType: ImplementationSpecific
port: 4318
# -- OtelCollector Ingress TLS
tls: []
# adjust the resource requests and limit as necessary
resources:
requests:
cpu: 200m
memory: 400Mi
limits:
cpu: 1000m
memory: 2Gi
nodeSelector: {}
tolerations: []
affinity: {}
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 11
targetCPUUtilizationPercentage: 50
targetMemoryUtilizationPercentage: 50
behavior: {}
autoscalingTemplate: []
keda:
enabled: false
pollingInterval: "30" # check 30sec periodically for metrics data
cooldownPeriod: "300" # once the load decreased, it will wait for 5 min and downscale
minReplicaCount: "1" # should be >= replicaCount specified in values.yaml
maxReplicaCount: "5"
triggers:
- type: memory
metadata:
type: Utilization
value: "80" # hpa make sure average Utilization <=80 by adding new pods
- type: cpu
metadata:
type: Utilization
value: "80" # hpa make sure average Utlization <=80 by adding new pods
config:
receivers:
otlp/spanmetrics:
protocols:
grpc:
endpoint: localhost:12345
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
# Uncomment to enable thift_company receiver.
# You will also have set set enable it in `otelCollector.ports
# thrift_compact:
# endpoint: 0.0.0.0:6831
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
load: {}
memory: {}
disk: {}
filesystem: {}
network: {}
kubeletstats:
collection_interval: 60s
auth_type: serviceAccount
endpoint: ${K8S_NODE_NAME}:10250
insecure_skip_verify: true
metric_groups:
- container
- node
- pod
- volume
filelog/k8s:
include:
- /var/log/pods/*/*/*.log
exclude:
# Exclude logs from all containers named otel-collector
- /var/log/pods/*/otel-collector/*.log
start_at: beginning
include_file_path: true
include_file_name: false
operators:
# Find out which format is used by kubernetes
- type: router
id: get-format
routes:
- output: parser-docker
expr: 'body matches "^\\{"'
- output: parser-crio
expr: 'body matches "^[^ Z]+ "'
- output: parser-containerd
expr: 'body matches "^[^ Z]+Z"'
# Parse CRI-O format
- type: regex_parser
id: parser-crio
regex: '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
output: extract_metadata_from_filepath
timestamp:
parse_from: attributes.time
layout_type: gotime
layout: '2006-01-02T15:04:05.000000000-07:00'
# Parse CRI-Containerd format
- type: regex_parser
id: parser-containerd
regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
output: extract_metadata_from_filepath
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
# Parse Docker format
- type: json_parser
id: parser-docker
output: extract_metadata_from_filepath
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
# Extract metadata from file path
- type: regex_parser
id: extract_metadata_from_filepath
regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
parse_from: attributes["log.file.path"]
# Rename attributes
- type: move
from: attributes.stream
to: attributes["log.iostream"]
- type: move
from: attributes.container_name
to: attributes["k8s.container.name"]
- type: move
from: attributes.namespace
to: attributes["k8s.namespace.name"]
- type: move
from: attributes.pod_name
to: attributes["k8s.pod.name"]
- type: move
from: attributes.restart_count
to: attributes["k8s.container.restart_count"]
- type: move
from: attributes.uid
to: attributes["k8s.pod.uid"]
- type: move
from: attributes.log
to: body
prometheus:
config:
global:
scrape_interval: 30s
scrape_configs:
- job_name: otel-collector
static_configs:
- targets:
- ${HOST_IP}:8888
processors:
batch:
send_batch_size: 1000
timeout: 10s
# Ref: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md
resourcedetection:
detectors: [env, system] # Include ec2/eks for AWS, gce/gke for GCP and azure/aks for Azure
# Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels
timeout: 2s
override: false
system:
hostname_sources: [os] # Alternatively, use [dns,os] for setting FQDN as host.name and os as fallback
signozspanmetrics/prometheus:
metrics_exporter: prometheus
latency_histogram_buckets:
[
100us,
1ms,
2ms,
6ms,
10ms,
50ms,
100ms,
250ms,
500ms,
1000ms,
1400ms,
2000ms,
5s,
10s,
20s,
40s,
60s,
]
dimensions_cache_size: 10000
dimensions:
- name: service.namespace
default: default
- name: deployment.environment
default: default
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: localhost:55679
pprof:
endpoint: localhost:1777
exporters:
clickhousetraces:
datasource: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?database=${CLICKHOUSE_TRACE_DATABASE}&username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
clickhousemetricswrite:
endpoint: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?database=${CLICKHOUSE_DATABASE}&username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
resource_to_telemetry_conversion:
enabled: true
clickhouselogsexporter:
dsn: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
timeout: 10s
sending_queue:
queue_size: 100
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
prometheus:
endpoint: 0.0.0.0:8889
service:
telemetry:
metrics:
address: 0.0.0.0:8888
extensions: [health_check, zpages]
pipelines:
traces:
receivers: [jaeger, otlp]
processors: [signozspanmetrics/prometheus, batch]
exporters: [clickhousetraces]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [clickhousemetricswrite]
metrics/generic:
receivers: [hostmetrics, kubeletstats, prometheus]
processors: [resourcedetection, batch]
exporters: [clickhousemetricswrite]
metrics/spanmetrics:
receivers: [otlp/spanmetrics]
exporters: [prometheus]
logs:
receivers: [filelog/k8s, otlp]
processors: [batch]
exporters: [clickhouselogsexporter]
# Default values for OtelCollectorMetrics
otelCollectorMetrics:
name: "otel-collector-metrics"
image:
registry: docker.io
repository: signoz/signoz-otel-collector
tag: 0.55.0
pullPolicy: IfNotPresent
#pullPolicy: Always
imagePullSecrets: []
# OtelCollectorMetrics service
service:
annotations: {}
type: ClusterIP
serviceAccount:
create: true
annotations: {}
name:
annotations: {}
podAnnotations:
signoz.io/scrape: 'true'
signoz.io/port: '8888'
signoz.io/path: /metrics
minReadySeconds: 5
progressDeadlineSeconds: 120
replicaCount: 1
initContainers:
init:
enabled: true
image:
registry: docker.io
repository: busybox
tag: 1.35
pullPolicy: IfNotPresent
command:
delay: 5
endpoint: /ping
waitMessage: "waiting for clickhouseDB"
doneMessage: "clickhouse ready, starting otel collector metrics now"
# Configuration for ports
ports:
metrics:
enabled: false
containerPort: 8888
servicePort: 8888
protocol: TCP
zpages:
enabled: false
containerPort: 55679
servicePort: 55679
protocol: TCP
health-check:
enabled: true
containerPort: 13133
servicePort: 13133
protocol: TCP
pprof:
enabled: false
containerPort: 1777
servicePort: 1777
protocol: TCP
livenessProbe:
enabled: false
port: 13133
path: /
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
enabled: false
port: 13133
path: /
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
## Custom liveness and readiness probes
customLivenessProbe: {}
customReadinessProbe: {}
ingress:
enabled: true
className: ""
annotations: {}
hosts:
- host: signoz-otelcollector-metrics-prod.xxx.com
paths:
- path: /
pathType: ImplementationSpecific
port: 13133
tls: []
# adjust the resource requests and limit as necessary
resources:
requests:
cpu: 200m
memory: 400Mi
limits:
cpu: 1000m
memory: 2Gi
nodeSelector: {}
tolerations: []
affinity: {}
config:
receivers:
k8s_cluster:
collection_interval: 60s
node_conditions_to_report: [Ready, MemoryPressure]
# Data sources: metrics
prometheus:
config:
scrape_configs:
# otel-collector-metrics internal metrics
- job_name: "otel-collector-metrics"
scrape_interval: 60s
static_configs:
- targets:
- ${MY_POD_IP}:8888
# generic prometheus metrics scraper (scrapped when pod annotations are set)
- job_name: "generic-collector"
scrape_interval: 60s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
[__meta_kubernetes_pod_annotation_signoz_io_scrape]
action: keep
regex: true
- source_labels:
[__meta_kubernetes_pod_annotation_signoz_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[
__meta_kubernetes_pod_ip,
__meta_kubernetes_pod_annotation_signoz_io_port,
]
action: replace
separator: ":"
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: k8s_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: k8s_pod
processors:
batch:
send_batch_size: 1000
timeout: 10s
# -- Memory Limiter processor
# If set to null, will be overridden with values based on k8s resource limits.
memory_limiter: null
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: localhost:55679
pprof:
endpoint: localhost:1777
exporters:
clickhousemetricswrite:
endpoint: tcp://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/?database=${CLICKHOUSE_DATABASE}&username=${CLICKHOUSE_USER}&password=${CLICKHOUSE_PASSWORD}
service:
telemetry:
metrics:
address: 0.0.0.0:8888
extensions: [health_check, zpages, pprof]
pipelines:
metrics:
receivers: [k8s_cluster, prometheus]
processors: [batch]
exporters: [clickhousemetricswrite]
@yxiaoy6 Only
ClickHouseUrl
is needed forquery-service
. From logs, it looks like clickhouse address is not right.Can you please share the environments passed? You can redact sensitive information with
*
orx
?
otel-collector always restart
@yxiaoy6 Only
ClickHouseUrl
is needed forquery-service
. From logs, it looks like clickhouse address is not right. Can you please share the environments passed? You can redact sensitive information with*
orx
?otel-collector always restart
@prashant-shahi Looking forward for your response, thanks!
It should be because the clickhouse version is too low
@yxiaoy6 Thank you for being patient and responding with the solution to the reported issue.
When using external clickhouse, make sure:
22.4.X
for SigNoz v0.8.X
till v0.11.X
(and future releases)
Bug description
Deploying with Helm directly value.yaml:
Please describe.
query-service fail
I make sure the account password is correct and the signoz_log library has been automatically created in my clickhouse otel-collector:
If this affects the front-end, screenshots would be of great help.
Expected behavior
How to reproduce
Version information
Additional context
Thank you for your bug report – we love squashing them!