elastic / apm-agent-dotnet

https://www.elastic.co/guide/en/apm/agent/dotnet/current/index.html
Apache License 2.0
581 stars 207 forks source link

Agent hangs in .NET 7 containers (using DOTNET_STARTUP_HOOKS) #1973

Open z1c0 opened 1 year ago

z1c0 commented 1 year ago

APM Agent version

1.19.0

Environment

Operating system and version:

Linux Docker containers built with the .NET 7 publish command (see here).

.NET Framework/Core name and version:

.NET 7

Describe the bug

When injecting the agent into a Docker container build with the publish command, the agent does not start up any more.

Other agent injection methods are probably affected as well but have not been tested yet.

To Reproduce

Steps to reproduce the behavior:

  1. Use this approach to create a container.
  2. Inject the agent using the DOTNET_STARTUP_HOOKS method.
  3. Enable logging
  4. Agent hangs in the call to Elastic.Apm.StartupHook.Loader.Loader.Initialize.

Output:

[2023-01-12 20:50:04Z] Check if System.Diagnostics.DiagnosticSource is loaded
[2023-01-12 20:50:04Z] Assemblies loaded:
System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e
dotnet7_publish, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
System.Runtime, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
ElasticApmAgentStartupHook, Version=1.20.0.0, Culture=neutral, PublicKeyToken=ae7400d2c189cf22
System.Collections, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Runtime.Extensions, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.IO.FileSystem, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Linq, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Runtime.Loader, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Text.RegularExpressions, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Runtime.InteropServices, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Console, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Threading, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
Microsoft.Win32.Primitives, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
[2023-01-12 20:50:04Z] No System.Diagnostics.DiagnosticSource loaded. Load using AssemblyLoadContext.Default
[2023-01-12 20:50:04Z] System.Diagnostics.DiagnosticSource 7.0.0.0 loaded
[2023-01-12 20:50:04Z] Get Elastic.Apm.StartupHook.Loader.Loader type
[2023-01-12 20:50:04Z] Get Elastic.Apm.StartupHook.Loader.Loader.Initialize method
[2023-01-12 20:50:04Z] Invoke Elastic.Apm.StartupHook.Loader.Loader.Initialize method

Workaround

Initial research showed that the agent's metric collection initialization in the MetricsCollector constructor is the point where the agent hangs. Disabling the agent's metric collection via ELASTIC_APM_METRICS_INTERVAL=0 prevents the hang.

z1c0 commented 1 year ago

Further investigation showed that GcMetricsProvider causes the problem. Hence, the workaround can be narrowed down to disabling any GC-related metrics:

ELASTIC_APM_DISABLE_METRICS='clr.gc.*'

Could be related to https://github.com/dotnet/runtime/issues/53564

The specific line of code that causes the agent to hang is EnableEvents(eventSource, EventLevel.Informational, (EventKeywords)keywordGC); in GcEventListener.