Open ahmelsayed opened 7 years ago
Thanks Ahmed. Offending line of code is here: https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Tables/TableExtension.cs#L71.
While addressing this, we should review other synchronous wait code like this.
@fabiocav says this specific issue with table storage should be resolved with the dotnet core migration.
@mathewc still warns there are other places this pattern might exist
Eduardo says we should look at the number of times we see this exception.
@christopheranderson it's not an exception, just a call stack of the deadlock. we can't find how many people run into this because we need to either attach a live debugger or analyze a memory dump
This code is:
var account = Task.Run(() => this._accountProvider.GetStorageAccountAsync(attribute, CancellationToken.None)).GetAwaiter().GetResult();
The extra goo there should give this the same semantics as Thread.Start/Thread.Join and ensure we're not deadlocking. We use this pattern pretty broadly, so if this doesn't work, we have a bigger problem.
We may be blocked waiting on the GetStorageAccountAsync() task (and not actually deadlocking).
That task eventually calls into this long-running operation here: https://github.com/Azure/azure-webjobs-sdk/blob/5ea3892dbb31182d48b8d965b9fc3df585fdc8ed/src/Microsoft.Azure.WebJobs.Host/Executors/DefaultStorageCredentialsValidator.cs#L40
Got a new dump from this in thread Function app experienced downtime several times a day
couple user threads, ~200 waiting on GetTable, ~200 waiting on AsyncInvoker (created from AsyncConverter<Attr, CloudTable>)
I think that @MikeStall is onto something in the issue - it's possible we're stuck validating the account via http requests.
GetServicePropertiesAsync has a comment mentioning that the call can fail if the storage account name is incorrect. https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Executors/DefaultStorageCredentialsValidator.cs#L41
This may be exacerbating the problem – if we have multiple concurrent requests and validation is slow, there may be many calls to ValidateCredentialsAsync. https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Executors/DefaultStorageAccountProvider.cs#L194
A couple ideas to help with this:
Deadlock call stack is below. I also have a memory dump. More context in an email with subject
Function monitoring page stuck