Azure / azure-webjobs-sdk

Azure WebJobs SDK
MIT License
739 stars 358 forks source link

AppService Crashing Every 10 minutes #3091

Open Sukhdev841 opened 4 months ago

Sukhdev841 commented 4 months ago

Please provide a succinct description of the issue. We are using WebJob SDK to deploy Time triggered functions on Azure AppService. But the AppService is crashing every 10 minutes and restarting due to the following exception:

Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: Azure.RequestFailedException at Azure.Storage.Blobs.BlobRestClient+d50.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Azure.Storage.Blobs.Specialized.BlobLeaseClient+d32.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Azure.Storage.Blobs.Specialized.BlobLeaseClient+d31.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Azure.WebJobs.Host.BlobLeaseDistributedLockManager+SingletonLockHandle+d17.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Azure.WebJobs.Host.SingletonManager+RenewLeaseCommand+d4.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at Microsoft.Azure.WebJobs.Host.Timers.TaskSeriesTimer+d14.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at System.Threading.ThreadHelper.ThreadStart()

Repro steps

Just deploying to AppService is reproducing issues. Not sure what is the trigger for the crashes.

Provide the steps required to reproduce the problem

  1. Step A

  2. Step B

Expected behavior

Application deployed in AppService should not crash.

Provide a description of the expected behavior. Application deployed in AppService should not crash.

Actual behavior

Provide a description of the actual behavior observed.

Known workarounds

Provide a description of any known workarounds.

Related information

Provide any related information

jviau commented 4 months ago

@Sukhdev841 Was this ever working for this application? This is most likely an app configuration issue. What have you changed regarding your blob storage or app service recently? Have you switched to managed identity? Added or changed a VNET? Changed IConfiguration values for your application?

Azure.RequestFailedException will contain an HTTP error code and further information on what failed. Do you have that available in the logs?

Sukhdev841 commented 4 months ago

Hi @jviau, thanks for your comment.

The setup has been working previously for us. And yes, we have recently moved to managed identity, but the setup was working with MI as well.

Do you think switching to Managed Identity along with disabling SAS on the connected Storage Account could lead to this issue?

At the moment I don't have any other information than the stack trace of exception. Will try enabling Trace Logs on AppService to see if any additional related logs can be obtained.

jviau commented 4 months ago

What order did you perform the migration? Did this start occurring immediately after disabling SAS?

Sukhdev841 commented 4 months ago
  1. We first moved to MI based auth and then disabled SAS on the Storage Account.
  2. For the second part I'm not very sure. Reason is we get to know about the issue around 10th of July and could see those crashes happening since 1st of July. We disabled SAS around 26th June. Since AppService rotates the log file every 7 days, right now we don't have the exact history if crashes started happening immediately after disabling SAS.

On a side note, we have a similar PPE setup where Storage SAS is disabled and MI auth is enabled, such crashes were observed in the PPE setup around 1st-3rd July and recently around 17th July, but in Prod setup crashes are happening continuously every 10-15 minutes.

jviau commented 4 months ago

@Sukhdev841 there should be another log with more details:

https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host.Storage/Singleton/BlobLeaseDistributedLockManager.cs#L409

Sukhdev841 commented 4 months ago

Hi @jviau I tried enabling Request Trace Error logs as well, but no additional logs. Please share if you know which logs I need to check and how to enable them, but I'm guessing you might not have much idea of AppService side of things.

jviau commented 4 months ago

The logs you need to check depend on your application. The log is produced using Microsoft.Extensions.Logging ILogger. Where it goes is controlled by how you configure logging for your application.

Do you use application insights? Or some other logger provider?

Sukhdev841 commented 4 months ago

The application insights require Storage Account connectivity with SAS enabled. Due to security reasons we had to disable SAS.

The log I shared with you were from internal log files of AppService platform.

jviau commented 4 months ago

Current app insights does not require storage. https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=aspnetcore

Sukhdev841 commented 3 months ago

I see it require code changes and used connection string as a link between Insights resource and AppService. Will evaluate on our end if we can do this.