elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
384 stars 114 forks source link

Introduce global logging environment variables #869

Open Mpdreamz opened 5 months ago

Mpdreamz commented 5 months ago

In the .NET agent and OpenTelemetry distribution we are introducing new OTEL variables to enable global file logging:

Overview

If any of these are provided the agent or distribution will start logging, regardless of how the application is set up to log.

ELASTIC_OTEL_LOG_DIRECTORY defaults to:

We don't differentiate between agent and distro for the application moniker apm-agent-dotnet, should we?

ELASTIC_OTEL_LOG_TARGETS

Semicolon separated list of options.

The default is none unless the other two variables are set in which case its file

ELASTIC_OTEL_LOG_LEVEL The name of the log level.

Benefit

Troubleshooting the agent becomes rather easy for our users. We only have to instruct them to set ELASTIC_OTEL_LOG_LEVEL=Trace to get logs.

Comparing that to what we point users to today: https://www.elastic.co/guide/en/apm/agent/dotnet/current/troubleshooting.html#collect-agent-logs where we first need to find out how the users is running the agent and depending on the actual stack have various different ways to turn on logging. This might be very .NET specific though.

Discussion

We expect our users to set these variables system wide, even if they don't technically have to.

Should we standardize on these variables to enable logging for our new distributions? Have one common way to start debugging and troubleshooting them?

Cc @elastic/apm-agent-devs

trentm commented 5 months ago

I would worry about the confusion of ELASTIC_OTEL_LOG_LEVEL with the OTel-defined OTEL_LOG_LEVEL var: https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#general-sdk-configuration

(Note that the level names for OTEL_LOG_LEVEL aren't current well-defined. Issue 2039 in https://github.com/open-telemetry/opentelemetry-specification (I didn't want a link back to this from there) is discussing defining those names.)

The Node.js OTel distro is currently using OTEL_LOG_LEVEL (defaulting to info) to determine its log level. It logs to stdout, with no option to target anything else. Yes, logging to stdout means mixing output in with the application's output, but I've never seen that being considered a problem with the classic agent.

.NET-land may have different expectations for wanting APM agent logs going somewhere else.