aws / aws-logging-dotnet

.NET Libraries for integrating Amazon CloudWatch Logs with popular .NET logging libraries
Apache License 2.0
296 stars 133 forks source link

Lazily initialize the internal CloudWatch Logs client to avoid a deadlock when using log4net #269

Closed ashovlin closed 3 months ago

ashovlin commented 3 months ago

Issue #, if available:

253, https://github.com/aws/aws-sdk-net/issues/1968, DOTNET-5636

Description of changes:

There's a possible deadlock when using:

  1. AWS.Logger.Log4net
  2. AWSSDK.Core >= 3.7.1 (which introduced a configurable IMDS URI)
  3. Either EC2 IMDS credentials or no credentials

I wrote a longer description with stack traces in https://github.com/aws/aws-sdk-net/issues/1968#issuecomment-2121251710, but in summary:

  1. The application thread calls LogManager.GetLogger to activate our AWSAppender. This gets this lock in log4net's DefaultRepositorySelector, then initializes the internal CloudWatch Logs client. The client initialization needs to access the configured retry mode here, via the static FallbackInternalConfigurationFactory.RetryMode.
  2. AWSLoggerCore loads credentials which potentially kicks off this timer in DefaultInstanceProfileAWSCredentials. This starts a new thread for loading the IMDS credentials. As of AWSSDK.Core 3.7.1 when the IMDS endpoint became configurable, this relies on the static FallbackInternalConfigurationFactory.EC2MetadataServiceEndpoint for resolving the IMDS endpoint here. The static constructor for FallbackInternalConfigurationFactory will create EnvironmentVariableInternalConfiguration instance, which gets its own logger here.

This deadlocks when the static constructor for FallbackInternalConfigurationFactory is run on thread 2 because:

  1. Thread 1 is waiting on the FallbackInternalConfigurationFactory static constructor to finish while holding the log4net lock.
  2. Thread 2 is running the FallbackInternalConfigurationFactory static constructor, but is blocked on thread 1 when creating additional loggers inside the static constructor.

By delaying the initialization of the internal CloudWatch Logs client, we allow thread 1 above to finish before thread 2 starts.

Testing

  1. Existing test cases pass
  2. Manually tested with the application https://github.com/alverdal/awslog4net-example via https://github.com/aws/aws-sdk-net/issues/1968#issuecomment-1007317351

Alternatives I considered:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.