SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
19.16k stars 1.26k forks source link

Different machine hostmetrics fetch fail. #3278

Open KokoTa opened 1 year ago

KokoTa commented 1 year ago

Bug description

I have two machine, IP 192.168.2.102 and 192.168.2.101

Signoz install in 192.168.2.101 and dashboard show successful:

image

But follow 《OpenTelemetry Binary Usage in Virtual Machine》 chapter to fetch 192.168.2.102 is fail.

It show like:

2023/08/07 01:15:04 proto: duplicate proto type registered: jaeger.api_v2.PostSpansRequest
2023/08/07 01:15:04 proto: duplicate proto type registered: jaeger.api_v2.PostSpansResponse
2023-08-07T01:15:04.913-0700    info    service/telemetry.go:111    Setting up own telemetry...
2023-08-07T01:15:04.913-0700    info    service/telemetry.go:141    Serving Prometheus metrics  {"address": "0.0.0.0:8888", "level": "Basic"}
2023-08-07T01:15:04.916-0700    info    service/service.go:89   Starting otelcol-contrib... {"Version": "0.66.0", "NumCPU": 8}
2023-08-07T01:15:04.916-0700    info    extensions/extensions.go:41 Starting extensions...
2023-08-07T01:15:04.916-0700    info    extensions/extensions.go:44 Extension is starting...    {"kind": "extension", "name": "health_check"}
2023-08-07T01:15:04.916-0700    info    healthcheckextension@v0.66.0/healthcheckextension.go:44 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"ExtensionConfig":null,"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"Path":"/","CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-08-07T01:15:04.916-0700    warn    internal/warning.go:51  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-08-07T01:15:04.916-0700    info    extensions/extensions.go:48 Extension started.  {"kind": "extension", "name": "health_check"}
2023-08-07T01:15:04.916-0700    info    extensions/extensions.go:44 Extension is starting...    {"kind": "extension", "name": "zpages"}
2023-08-07T01:15:04.916-0700    info    zpagesextension@v0.66.0/zpagesextension.go:64   Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2023-08-07T01:15:04.916-0700    info    zpagesextension@v0.66.0/zpagesextension.go:74   Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2023-08-07T01:15:04.917-0700    info    zpagesextension@v0.66.0/zpagesextension.go:86   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"ExtensionConfig":null,"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-08-07T01:15:04.917-0700    info    extensions/extensions.go:48 Extension started.  {"kind": "extension", "name": "zpages"}
2023-08-07T01:15:04.917-0700    info    pipelines/pipelines.go:74   Starting exporters...
2023-08-07T01:15:04.917-0700    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "metrics", "name": "otlp"}
2023-08-07T01:15:04.917-0700    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "metrics", "name": "otlp"}
2023-08-07T01:15:04.917-0700    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "logs", "name": "otlp"}
2023-08-07T01:15:04.918-0700    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "logs", "name": "otlp"}
2023-08-07T01:15:04.918-0700    info    pipelines/pipelines.go:78   Exporter is starting... {"kind": "exporter", "data_type": "traces", "name": "otlp"}
2023-08-07T01:15:04.918-0700    info    pipelines/pipelines.go:82   Exporter started.   {"kind": "exporter", "data_type": "traces", "name": "otlp"}
2023-08-07T01:15:04.918-0700    info    pipelines/pipelines.go:86   Starting processors...
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "traces"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "traces"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "metrics/internal"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "metrics/internal"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "logs"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "logs"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:90   Processor is starting...    {"kind": "processor", "name": "batch", "pipeline": "metrics"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:94   Processor started.  {"kind": "processor", "name": "batch", "pipeline": "metrics"}
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:98   Starting receivers...
2023-08-07T01:15:04.919-0700    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2023-08-07T01:15:04.919-0700    warn    internal/warning.go:51  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "pipeline": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-08-07T01:15:04.919-0700    info    otlpreceiver@v0.66.0/otlp.go:71 Starting GRPC server    {"kind": "receiver", "name": "otlp", "pipeline": "traces", "endpoint": "0.0.0.0:4317"}
2023-08-07T01:15:04.920-0700    warn    internal/warning.go:51  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "pipeline": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-08-07T01:15:04.920-0700    info    otlpreceiver@v0.66.0/otlp.go:89 Starting HTTP server    {"kind": "receiver", "name": "otlp", "pipeline": "traces", "endpoint": "0.0.0.0:4318"}
2023-08-07T01:15:04.920-0700    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2023-08-07T01:15:04.920-0700    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "metrics"}
2023-08-07T01:15:04.920-0700    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "otlp", "pipeline": "metrics"}
2023-08-07T01:15:04.920-0700    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2023-08-07T01:15:04.921-0700    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics"}
2023-08-07T01:15:04.921-0700    info    pipelines/pipelines.go:102  Receiver is starting... {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2023-08-07T01:15:04.921-0700    info    pipelines/pipelines.go:106  Receiver started.   {"kind": "receiver", "name": "otlp", "pipeline": "logs"}
2023-08-07T01:15:04.921-0700    info    healthcheck/handler.go:129  Health Check state change   {"kind": "extension", "name": "health_check", "status": "ready"}
2023-08-07T01:15:04.921-0700    info    service/service.go:106  Everything is ready. Begin running and processing data.
2023-08-07T01:15:10.239-0700    error   scraperhelper/scrapercontroller.go:214  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:214
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:193
2023-08-07T01:15:15.196-0700    error   scraperhelper/scrapercontroller.go:214  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:214
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:193
2023-08-07T01:15:20.194-0700    error   scraperhelper/scrapercontroller.go:214  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:214
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:193
2023-08-07T01:15:25.175-0700    error   scraperhelper/scrapercontroller.go:214  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:214
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:193
2023-08-07T01:15:30.246-0700    error   scraperhelper/scrapercontroller.go:214  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:214
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector@v0.66.0/receiver/scraperhelper/scrapercontroller.go:193

Expected behavior

Fetch diffrent machine hostmetrics

How to reproduce

https://signoz.io/docs/tutorial/opentelemetry-binary-usage-in-virtual-machine/#plain-binary

Version information

Additional context

Thank you for your bug report – we love squashing them!

srikanthccv commented 1 year ago

Can you share the config you used for binary? Is it just failing to scrape one metric? or are all metrics not working?

KokoTa commented 1 year ago

@srikanthccv I use curl -sL https://github.com/SigNoz/benchmark/raw/main/dashboards/hostmetrics/hostmetrics-import.sh | bash from chapter above. machine with signoz have the dashbaord metric, but other machine all fail. I test vm server and cloud server, all get metric fail. Maybe problem from here: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/18232 ? I use new version opentelemetry-collector-contrib also get this result.

srikanthccv commented 1 year ago

While the issue is related, it wouldn't make the other scrapers not work. You should still be able to see metrics for remaining scrapers except for the process.

KokoTa commented 1 year ago

@srikanthccv I upgrate signoz and opentelemetry-collector-contrib, now see like this:

(machine is using Plain Binary wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.79.0/otelcol-contrib_0.79.0_linux_amd64.tar.gz)

2023-08-08T18:34:53.464-0700    info    service/telemetry.go:104    Setting up own telemetry...
2023-08-08T18:34:53.467-0700    info    service/telemetry.go:127    Serving Prometheus metrics  {"address": "0.0.0.0:8888", "level": "Basic"}
2023-08-08T18:34:53.472-0700    info    service/service.go:131  Starting otelcol-contrib... {"Version": "0.79.0", "NumCPU": 8}
2023-08-08T18:34:53.472-0700    info    extensions/extensions.go:30 Starting extensions...
2023-08-08T18:34:53.472-0700    info    extensions/extensions.go:33 Extension is starting...    {"kind": "extension", "name": "health_check"}
2023-08-08T18:34:53.472-0700    info    healthcheckextension@v0.79.0/healthcheckextension.go:34 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-08-08T18:34:53.472-0700    warn    internal/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-08-08T18:34:53.472-0700    info    extensions/extensions.go:37 Extension started.  {"kind": "extension", "name": "health_check"}
2023-08-08T18:34:53.472-0700    info    extensions/extensions.go:33 Extension is starting...    {"kind": "extension", "name": "zpages"}
2023-08-08T18:34:53.472-0700    info    zpagesextension@v0.79.0/zpagesextension.go:53   Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2023-08-08T18:34:53.472-0700    info    zpagesextension@v0.79.0/zpagesextension.go:63   Registered Host's zPages    {"kind": "extension", "name": "zpages"}
2023-08-08T18:34:53.473-0700    info    zpagesextension@v0.79.0/zpagesextension.go:75   Starting zPages extension   {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-08-08T18:34:53.473-0700    info    extensions/extensions.go:37 Extension started.  {"kind": "extension", "name": "zpages"}
2023-08-08T18:34:53.475-0700    warn    internal/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-08-08T18:34:53.475-0700    info    otlpreceiver@v0.79.0/otlp.go:83 Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2023-08-08T18:34:53.475-0700    warn    internal/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-08-08T18:34:53.475-0700    info    otlpreceiver@v0.79.0/otlp.go:101    Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4318"}
2023-08-08T18:34:53.475-0700    info    internal/resourcedetection.go:125   began detecting resource information    {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/internal"}
2023-08-08T18:34:53.475-0700    info    internal/resourcedetection.go:139   detected resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics/internal", "resource": {"host.id":"9a32b987173a410b9dfbed9aa2746f2a","host.name":"192.168.2.102","os.type":"linux"}}
2023-08-08T18:34:53.476-0700    info    prometheusreceiver@v0.79.0/metrics_receiver.go:242  Starting discovery manager  {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-08-08T18:34:53.841-0700    info    prometheusreceiver@v0.79.0/metrics_receiver.go:233  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "otel-collector-binary"}
2023-08-08T18:34:53.842-0700    info    prometheusreceiver@v0.79.0/metrics_receiver.go:281  Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-08-08T18:34:53.843-0700    info    healthcheck/handler.go:129  Health Check state change   {"kind": "extension", "name": "health_check", "status": "ready"}
2023-08-08T18:34:53.843-0700    info    service/service.go:148  Everything is ready. Begin running and processing data.
2023-08-08T18:34:55.135-0700    error   scraperhelper/scrapercontroller.go:213  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error": "error reading parent pid for process \"systemd\" (pid 1): invalid pid 0; error reading process executable for pid 2: readlink /proc/2/exe: no such file or directory; error reading parent pid for process \"kthreadd\" (pid 2): invalid pid 0; error reading process executable for pid 4: readlink /proc/4/exe: no such file or directory; error reading process executable for pid 6: readlink /proc/6/exe: no such file or directory; error reading process executable for pid 7: readlink /proc/7/exe: no such file or directory; error reading process executable for pid 8: readlink /proc/8/exe: no such file or directory; error reading process executable for pid 9: readlink /proc/9/exe: no such file or directory; error reading process executable for pid 10: readlink /proc/10/exe: no such file or directory; error reading process executable for pid 11: readlink /proc/11/exe: no such file or directory; error reading process executable for pid 12: readlink /proc/12/exe: no such file or directory; error reading process executable for pid 13: readlink /proc/13/exe: no such file or directory; error reading process executable for pid 14: readlink /proc/14/exe: no such file or directory; error reading process executable for pid 16: readlink /proc/16/exe: no such file or directory; error reading process executable for pid 17: readlink /proc/17/exe: no such file or directory; error reading process executable for pid 18: readlink /proc/18/exe: no such file or directory; error reading process executable for pid 19: readlink /proc/19/exe: no such file or directory; error reading process executable for pid 21: readlink /proc/21/exe: no such file or directory; error reading process executable for pid 22: readlink /proc/22/exe: no such file or directory; error reading process executable for pid 23: readlink /proc/23/exe: no such file or directory; error reading process executable for pid 24: readlink /proc/24/exe: no such file or directory; error reading process executable for pid 26: readlink /proc/26/exe: no such file or directory; error reading process executable for pid 27: readlink /proc/27/exe: no such file or directory; error reading process executable for pid 28: readlink /proc/28/exe: no such file or directory; error reading process executable for pid 29: readlink /proc/29/exe: no such file or directory; error reading process executable for pid 31: readlink /proc/31/exe: no such file or directory; error reading process executable for pid 32: readlink /proc/32/exe: no such file or directory; error reading process executable for pid 33: readlink /proc/33/exe: no such file or directory; error reading process executable for pid 34: readlink /proc/34/exe: no such file or directory; error reading process executable for pid 36: readlink /proc/36/exe: no such file or directory; error reading process executable for pid 37: readlink /proc/37/exe: no such file or directory; error reading process executable for pid 38: readlink /proc/38/exe: no such file or directory; error reading process executable for pid 39: readlink /proc/39/exe: no such file or directory; error reading process executable for pid 41: readlink /proc/41/exe: no such file or directory; error reading process executable for pid 42: readlink /proc/42/exe: no such file or directory; error reading process executable for pid 43: readlink /proc/43/exe: no such file or directory; error reading process executable for pid 44: readlink /proc/44/exe: no such file or directory; error reading process executable for pid 46: readlink /proc/46/exe: no such file or directory; error reading process executable for pid 48: readlink /proc/48/exe: no such file or directory; error reading process executable for pid 49: readlink /proc/49/exe: no such file or directory; error reading process executable for pid 50: readlink /proc/50/exe: no such file or directory; error reading process executable for pid 51: readlink /proc/51/exe: no such file or directory; error reading process executable for pid 52: readlink /proc/52/exe: no such file or directory; error reading process executable for pid 53: readlink /proc/53/exe: no such file or directory; error reading process executable for pid 54: readlink /proc/54/exe: no such file or directory; error reading process executable for pid 55: readlink /proc/55/exe: no such file or directory; error reading process executable for pid 56: readlink /proc/56/exe: no such file or directory; error reading process executable for pid 57: readlink /proc/57/exe: no such file or directory; error reading process executable for pid 58: readlink /proc/58/exe: no such file or directory; error reading process executable for pid 59: readlink /proc/59/exe: no such file or directory; error reading process executable for pid 65: readlink /proc/65/exe: no such file or directory; error reading process executable for pid 66: readlink /proc/66/exe: no such file or directory; error reading process executable for pid 67: readlink /proc/67/exe: no such file or directory; error reading process executable for pid 68: readlink /proc/68/exe: no such file or directory; error reading process executable for pid 76: readlink /proc/76/exe: no such file or directory; error reading process executable for pid 78: readlink /proc/78/exe: no such file or directory; error reading process executable for pid 79: readlink /proc/79/exe: no such file or directory; error reading process executable for pid 81: readlink /proc/81/exe: no such file or directory; error reading process executable for pid 83: readlink /proc/83/exe: no such file or directory; error reading process executable for pid 97: readlink /proc/97/exe: no such file or directory; error reading process executable for pid 133: readlink /proc/133/exe: no such file or directory; error reading process executable for pid 295: readlink /proc/295/exe: no such file or directory; error reading process executable for pid 296: readlink /proc/296/exe: no such file or directory; error reading process executable for pid 297: readlink /proc/297/exe: no such file or directory; error reading process executable for pid 300: readlink /proc/300/exe: no such file or directory; error reading process executable for pid 306: readlink /proc/306/exe: no such file or directory; error reading process executable for pid 307: readlink /proc/307/exe: no such file or directory; error reading process executable for pid 310: readlink /proc/310/exe: no such file or directory; error reading process executable for pid 311: readlink /proc/311/exe: no such file or directory; error reading process executable for pid 312: readlink /proc/312/exe: no such file or directory; error reading process executable for pid 313: readlink /proc/313/exe: no such file or directory; error reading process executable for pid 316: readlink /proc/316/exe: no such file or directory; error reading process executable for pid 317: readlink /proc/317/exe: no such file or directory; error reading process executable for pid 331: readlink /proc/331/exe: no such file or directory; error reading process executable for pid 344: readlink /proc/344/exe: no such file or directory; error reading process executable for pid 345: readlink /proc/345/exe: no such file or directory; error reading process executable for pid 346: readlink /proc/346/exe: no such file or directory; error reading process executable for pid 347: readlink /proc/347/exe: no such file or directory; error reading process executable for pid 348: readlink /proc/348/exe: no such file or directory; error reading process executable for pid 349: readlink /proc/349/exe: no such file or directory; error reading process executable for pid 350: readlink /proc/350/exe: no such file or directory; error reading process executable for pid 351: readlink /proc/351/exe: no such file or directory; error reading process executable for pid 352: readlink /proc/352/exe: no such file or directory; error reading process executable for pid 353: readlink /proc/353/exe: no such file or directory; error reading process executable for pid 354: readlink /proc/354/exe: no such file or directory; error reading process executable for pid 355: readlink /proc/355/exe: no such file or directory; error reading process executable for pid 439: readlink /proc/439/exe: no such file or directory; error reading process executable for pid 563: readlink /proc/563/exe: no such file or directory; error reading process executable for pid 594: readlink /proc/594/exe: no such file or directory; error reading process executable for pid 595: readlink /proc/595/exe: no such file or directory; error reading process executable for pid 596: readlink /proc/596/exe: no such file or directory; error reading process executable for pid 598: readlink /proc/598/exe: no such file or directory; error reading process executable for pid 599: readlink /proc/599/exe: no such file or directory; error reading process executable for pid 600: readlink /proc/600/exe: no such file or directory; error reading process executable for pid 601: readlink /proc/601/exe: no such file or directory; error reading process executable for pid 603: readlink /proc/603/exe: no such file or directory; error reading process executable for pid 648: readlink /proc/648/exe: no such file or directory; error reading process executable for pid 649: readlink /proc/649/exe: no such file or directory; error reading process executable for pid 812: readlink /proc/812/exe: no such file or directory; error reading process executable for pid 871: readlink /proc/871/exe: no such file or directory; error reading process executable for pid 1087: readlink /proc/1087/exe: no such file or directory; error reading process executable for pid 1950: readlink /proc/1950/exe: no such file or directory; error reading process executable for pid 18015: readlink /proc/18015/exe: no such file or directory; error reading process executable for pid 20425: readlink /proc/20425/exe: no such file or directory; error reading process executable for pid 20591: readlink /proc/20591/exe: no such file or directory; error reading process executable for pid 27585: readlink /proc/27585/exe: no such file or directory; error reading process executable for pid 27626: readlink /proc/27626/exe: no such file or directory; error reading process executable for pid 27743: readlink /proc/27743/exe: no such file or directory; error reading process executable for pid 28064: readlink /proc/28064/exe: no such file or directory; error reading process executable for pid 58704: readlink /proc/58704/exe: no such file or directory; error reading process executable for pid 99701: readlink /proc/99701/exe: no such file or directory; error reading process executable for pid 101271: readlink /proc/101271/exe: no such file or directory; error reading process executable for pid 103449: readlink /proc/103449/exe: no such file or directory; error reading process executable for pid 113258: readlink /proc/113258/exe: no such file or directory; error reading process executable for pid 117018: readlink /proc/117018/exe: no such file or directory; error reading process executable for pid 124484: readlink /proc/124484/exe: no such file or directory; error reading process executable for pid 124653: readlink /proc/124653/exe: no such file or directory; error reading process executable for pid 127679: readlink /proc/127679/exe: no such file or directory; error reading process executable for pid 128054: readlink /proc/128054/exe: no such file or directory; error reading process executable for pid 128305: readlink /proc/128305/exe: no such file or directory; error reading process executable for pid 128631: readlink /proc/128631/exe: no such file or directory; error reading process executable for pid 129132: readlink /proc/129132/exe: no such file or directory; error reading process executable for pid 129290: readlink /proc/129290/exe: no such file or directory", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector/receiver@v0.79.0/scraperhelper/scrapercontroller.go:213
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector/receiver@v0.79.0/scraperhelper/scrapercontroller.go:188

image

srikanthccv commented 1 year ago

There should be "No Data" if the host didn't send any data. Are these two machines of the same kind?

KokoTa commented 1 year ago

@srikanthccv

Two machines are same, this is machine info:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
srikanthccv commented 1 year ago

I am not sure what's the issue. Just to rephrase your setup based on my understanding, you have two machines, and one of them runs SigNoz deployment. The machine which runs SigNoz has its host metrics working. The other machine uses the binary and has a pipeline that exports data to SigNoz using the OTLP exporter. You expect other machine host metrics to work, but they are now working. There are two broad things 1. Your second machine is not sending any data at all. 2. It sends data, but the dashboard is not working (which is less likely). Can you confirm if the other machine is sending data? Can you check if you don't see any errors in the console?

KokoTa commented 1 year ago

@srikanthccv Yes, i think the second machine is not send data after my test. I see above shell log, the collector has error error reading parent pid for process \"systemd\" (pid 1): invalid pid 0, and signoz no receive any data.Maybe the error cause data send fail?

srikanthccv commented 1 year ago

To my knowledge, that shouldn't be the case because the issue is coming from the process scraper, but the rest of the scrapers, such as cpu, memory etc... should all work. You could also be getting the same error in the SigNoz deployment machine.

srikanthccv commented 1 year ago

Please share the collector contrib config on the second machine.

KokoTa commented 1 year ago

@srikanthccv

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu: {}
      disk: {}
      load: {}
      filesystem: {}
      memory: {}
      network: {}
      paging: {}
      process:
        mute_process_name_error: true
      processes: {}
  prometheus:
    config:
      global:
        scrape_interval: 10s
      scrape_configs:
        - job_name: otel-collector-binary
          static_configs:
            - targets: ['localhost:8888']
processors:
  batch:
    send_batch_size: 1000
    timeout: 10s
  # Ref: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md
  resourcedetection:
    detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    timeout: 2s
    system:
      hostname_sources: [os] # alternatively, use [dns,os] for setting FQDN as host.name and os as fallback
extensions:
  health_check: {}
  zpages: {}
exporters:
  otlp:
    endpoint: 192.168.2.101:4317
    tls:
      insecure: true
  logging:
    # verbosity of the logging export: detailed, normal, basic
    verbosity: normal
service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  extensions: [health_check, zpages]
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics/internal:
      receivers: [prometheus, hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [otlp]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

I just change endpoint attribute.

srikanthccv commented 1 year ago

I will have to test this on a real machine with the same config to say anything more about it.

KokoTa commented 1 year ago

@srikanthccv Thanks for your help^ ^