CDCgov / prime-reportstream

ReportStream is a public intermediary tool for delivery of data between different parts of the healthcare ecosystem.
https://reportstream.cdc.gov
Creative Commons Zero v1.0 Universal
71 stars 39 forks source link

BUG: Prod exception: Azure.RequestFailedException: at Azure.Storage.Blobs.BlobRestClient #3116

Open jimduff-usds opened 2 years ago

jimduff-usds commented 2 years ago

This exception occurred at 1:45am ET on Thurs Nov 18: Azure.RequestFailedException: at Azure.Storage.Blobs.BlobRestClient+d__42.MoveNext

It does not appear to have occurred in our code stack. I see no tasks nor actions occurring around that time, in the database, nor any data failures.

This does seem to fit with a pattern of glitchiness occurring in the middle of the night recently. Various forms of this same exception have occurred dozens of times in the last month, sometimes with Microsoft.WindowsAzure.Storage.StorageException and other times with Azure.RequestFailedException

Regarding this specific occurrence: There was no activity in the Azure pdhprod-functionapp 'Activity Log' at that time. Immediately prior to the exception, this trace message appears in the logs:

Singleton lock renewal failed for blob 'pdhprod-functionapp-candidate/Host.Functions.scheduled-remote-connection-check.Listener' with error code 403: AuthorizationFailure. The last successful renewal completed at 2021-11-18T06:44:57.055Z (30204 milliseconds ago) with a duration of 8 milliseconds. The lease period was 60000 milliseconds.

From the Exception:

This request is not authorized to perform this operation.
RequestId:8331a772-a01e-0016-2447-dce6e3000000
Time:2021-11-18T06:45:27.2518918Z
Status: 403 (This request is not authorized to perform this operation.)
ErrorCode: AuthorizationFailure

Exception Call stack:

Azure.RequestFailedException:
   at Azure.Storage.Blobs.BlobRestClient+<RenewLeaseAsync>d__42.MoveNext (Azure.Storage.Blobs, Version=12.9.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Storage.Blobs.Specialized.BlobLeaseClient+<RenewInternal>d__29.MoveNext (Azure.Storage.Blobs, Version=12.9.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Storage.Blobs.Specialized.BlobLeaseClient+<RenewAsync>d__28.MoveNext (Azure.Storage.Blobs, Version=12.9.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Script.BlobLeaseDistributedLockManager+SingletonLockHandle+<RenewAsync>d__17.MoveNext (Microsoft.Azure.WebJobs.Script, Version=3.3.0.0, Culture=neutral, PublicKeyToken=null: /src/azure-functions-host/src/WebJobs.Script/Host/DistributedLockManagers/BlobLeaseDistributedLockManager.cs:334)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.SingletonManager+RenewLeaseCommand+<ExecuteAsync>d__4.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.31.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Singleton\SingletonManager.cs:337)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Timers.TaskSeriesTimer+<RunAsync>d__14.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.31.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Timers\TaskSeriesTimer.cs:147)
jimduff-usds commented 2 years ago

Occurred at 10:39:21 on 11/20 with this error immediately prior: Singleton lock renewal failed for blob 'pdhprod-functionapp-candidate/Host.Functions.emailScheduleEngine.Listener' with error code 403: AuthorizationFailure. The last successful renewal completed at 2021-11-20T15:38:51.507Z (30251 milliseconds ago) with a duration of 6 milliseconds. The lease period was 60000 milliseconds.

And on 11/21 with this error immediately prior:

Singleton lock renewal failed for blob 'pdhprod-functionapp-candidate/Host.Functions.scheduled-remote-connection-check.Listener' with error code 403: AuthorizationFailure. The last successful renewal completed at 2021-11-21T09:59:53.055Z (30252 milliseconds ago) with a duration of 5 milliseconds. The lease period was 60000 milliseconds.

jimduff-usds commented 2 years ago

We continue to see this exception on a regular basis - slightly more than once per day. Not sure I see a pattern.

This request is not authorized to perform this operation.

Azure.RequestFailedException:
   at Azure.Storage.Blobs.BlobRestClient+<RenewLeaseAsync>d__42.MoveNext (Azure.Storage.Blobs, Version=12.9.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Storage.Blobs.Specialized.BlobLeaseClient+<RenewInternal>d__29.MoveNext (Azure.Storage.Blobs, Version=12.9.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Storage.Blobs.Specialized.BlobLeaseClient+<RenewAsync>d__28.MoveNext (Azure.Storage.Blobs, Version=12.9.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Script.BlobLeaseDistributedLockManager+SingletonLockHandle+<RenewAsync>d__17.MoveNext (Microsoft.Azure.WebJobs.Script, Version=3.3.0.0, Culture=neutral, PublicKeyToken=null: /src/azure-functions-host/src/WebJobs.Script/Host/DistributedLockManagers/BlobLeaseDistributedLockManager.cs:334)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.SingletonManager+RenewLeaseCommand+<ExecuteAsync>d__4.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.31.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Singleton\SingletonManager.cs:337)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Timers.TaskSeriesTimer+<RunAsync>d__14.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.31.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Timers\TaskSeriesTimer.cs:147)
jimduff-usds commented 2 years ago

More examples: image.png

jimduff-usds commented 2 years ago

Here's something interesting: Staging got the exact same StorageException as Prod at 10:28pm on 12/31. Prod exception was at 10:28:30 and Staging was at 10:28:43. Coincidence?

The timings of the other 'not authorized' exceptions on 12/31 and 1/1 did not line up that way.

cwinters-usds commented 2 years ago

@jimduff-usds it doesn't look like the Azure client libraries do retries in the presence of 401/403 errors. Do you think that's something we can build into our code to deal with such hiccups?