DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.87k stars 1.2k forks source link

[BUG] gohai logs error "[Debug] Error fetching info for pid 1" even when log level is >= info, in otel collector datadog exporter #21487

Open ringerc opened 10 months ago

ringerc commented 10 months ago

https://github.com/DataDog/datadog-agent/tree/main/pkg/gohai emits an error like

1702327020251532308 [Debug] Error fetching info for pid 1: user: unknown userid 10001

... when invoked by the OpenTelemetry Collector Datadog Exporter's hostmetrics collector on startup. This is because there is no /etc/passwd or nss service etc in the container, it's a barebones os-less container.

The datadog exporter doesn't seem to use a logging adapter to send logs to the collector's log sink, so this is emitted irrespective of log level. It ignores the collector's configured log format and emits non-json format logs when the collector is configured for json logging. And it's unnecessary, meaningless noise.

The message comes from

https://github.com/DataDog/datadog-agent/blob/45c774dba115b395c1b09a94fcd428f49d6d440a/pkg/gohai/processes/gops/process_info.go#L60

I'm not immediately sure where the returned err is transformed into a log message with the wrong logging adapter, I didn't dig that far.

If this message is necessary at all, it should:

See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/14186 for details.

Agent Environment

N/A; this is about the OpenTelemetry Collector Datadog Exporter (which is managed by Datadog) running the gohai packages.

Describe what happened:

Annoying log message on every startup at all log levels. This message should NOT be emitted, given that the configured log level of my collector is

service:
  telemetry:
    logs:
      encoding: "json"                                                      
      level: "info"                                                         

Describe what you expected:

Message not emitted at all for >debug levels.

When debug level logs enabled, the log message should be emitted with proper json wrapping, and a caller context to identify where it came from.

Steps to reproduce the issue:

Run the example opentelemetry collector config provided by Datadog using the otel/opentelemetry-collector-contrib:0.90.1 image. Check the logs. Observe the error.

Additional environment details (Operating System, Cloud provider, etc):

N/A, you'll see this in docker or k8s or anywhere really.

ruben-chainalysis commented 5 months ago

Getting a different log line on otel/opentelemetry-collector-contrib:0.100.1:

[Debug] Error fetching info for pid 1: %!w(*fs.PathError=&{open /etc/passwd 2})

This comes up as the last log line after pods start up, nothing logged after it. Can be quite confusing.

r0fls commented 2 weeks ago

Knowing the cause/fix here would be useful. I'm seeing this as well with version 0.94.0 (quite old, I know... will look to update)

r0fls commented 2 weeks ago

It sounds like this is a red herring though and not an issue with the exporter