Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
711 stars 263 forks source link

Catch ObjectDisposedException in token renewal to prevent app crash #2776

Closed cgillum closed 3 months ago

cgillum commented 3 months ago

This fix comes from a crash dump analysis that was performed as part of ICM #487363612. In this particular case, the customer was using managed identities and there was a crash that occurred when we were writing logs about renewing a token.

024-03-20T02:21:16.215224146Z Unhandled exception on 03/20/2024 02:21:16: System.AggregateException: An error occurred while writing to logger(s). (IFeatureCollection has been disposed.
2024-03-20T02:21:16.215272947Z Object name: 'Collection'.)
2024-03-20T02:21:16.215351248Z  ---> System.ObjectDisposedException: IFeatureCollection has been disposed.
2024-03-20T02:21:16.215358648Z Object name: 'Collection'.
2024-03-20T02:21:16.215396648Z    at Microsoft.AspNetCore.Http.Features.FeatureReferences`1.ThrowContextDisposed()
2024-03-20T02:21:16.215401148Z    at Microsoft.AspNetCore.Http.DefaultHttpContext.get_Items()
2024-03-20T02:21:16.216846069Z    at Microsoft.Azure.WebJobs.Script.Extensions.HttpRequestExtensions.GetRequestPropertyOrDefault[TValue](HttpRequest request, String key) in /src/azure-functions-host/src/WebJobs.Script/Extensions/HttpRequestExtensions.cs:line 40
2024-03-20T02:21:16.216867470Z    at Microsoft.Azure.WebJobs.Script.Extensions.HttpRequestExtensions.GetRequestId(HttpRequest request) in /src/azure-functions-host/src/WebJobs.Script/Extensions/HttpRequestExtensions.cs:line 50
2024-03-20T02:21:16.216873470Z    at Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter) in /src/azure-functions-host/src/WebJobs.Script.WebHost/Diagnostics/SystemLogger.cs:line 150
2024-03-20T02:21:16.216885770Z    at Microsoft.Extensions.Logging.Logger.<Log>g__LoggerLog|12_0[TState](LogLevel logLevel, EventId eventId, ILogger logger, Exception exception, Func`3 formatter, List`1& exceptions, TState& state)
2024-03-20T02:21:16.216891170Z    --- End of inner exception stack trace ---
2024-03-20T02:21:16.216895470Z    at Microsoft.Extensions.Logging.Logger.ThrowLoggingError(List`1 exceptions)
2024-03-20T02:21:16.219139703Z    at Microsoft.Extensions.Logging.Logger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)
2024-03-20T02:21:16.219156403Z    at Microsoft.Extensions.Logging.LoggerExtensions.Log(ILogger logger, LogLevel logLevel, EventId eventId, Exception exception, String message, Object[] args)
2024-03-20T02:21:16.219222304Z    at Microsoft.Extensions.Logging.LoggerExtensions.LogInformation(ILogger logger, String message, Object[] args)
2024-03-20T02:21:16.220280119Z    at Microsoft.Azure.WebJobs.Extensions.DurableTask.EndToEndTraceHelper.RetrievingToken(String hubName, String resource) in D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\EndToEndTraceHelper.cs:line 988
2024-03-20T02:21:16.220302220Z    at Microsoft.Azure.WebJobs.Extensions.DurableTask.Auth.AzureCredentialFactory.OnRenewing(TokenRenewalState state) in D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\Auth\AzureCredentialFactory.cs:line 118
2024-03-20T02:21:16.220366020Z    at Microsoft.Azure.WebJobs.Extensions.DurableTask.Auth.AzureCredentialFactory.RenewTokenAsync(TokenRenewalState state, CancellationToken cancellationToken) in D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\Auth\AzureCredentialFactory.cs:line 87
2024-03-20T02:21:16.220479422Z    at Microsoft.WindowsAzure.Storage.Auth.TokenCredential.RenewTokenAsync(Object state)
2024-03-20T02:21:16.220490822Z    at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
2024-03-20T02:21:16.220496122Z    at System.Threading.QueueUserWorkItemCallback.<>c.<.cctor>b__6_0(QueueUserWorkItemCallback quwi)
2024-03-20T02:21:16.220527923Z    at System.Threading.ExecutionContext.RunForThreadPoolUnsafe[TState](ExecutionContext executionContext, Action`1 callback, TState& state)
2024-03-20T02:21:16.220532323Z    at System.Threading.QueueUserWorkItemCallback.Execute()
2024-03-20T02:21:16.220536223Z    at System.Threading.ThreadPoolWorkQueue.Dispatch()
2024-03-20T02:21:16.220547123Z    at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
2024-03-20T02:21:16.220551423Z    at System.Threading.Thread.StartCallback()

Unfortunately, some Kusto logs appear to be missing, which prevents us from narrowing down the exact root cause, but I suspect it's related to a host shutdown, since that's when we typically see these types of ObjectDisposedExceptions occur.

The fix is to wrap the token management calls with try/catch logic to swallow ObjectDisposedException, which will prevent the app from crashing.

NOTE: I'm actually not sure if this code applies to the v3.x branch, which has a separate auth mechanism. It might be a fix that's only needed for Durable Functions v2.x, but more investigation may be required.

Pull request checklist

cgillum commented 3 months ago

Closing this for now as I'm not sure it's the right solution, given new information about the case.