Closed ericsuhong closed 4 years ago
Not sure if this is related, but I'm getting a similar situation with only 1 queue reader -
Console.WriteLine("Before getting messages");
var messages = await _queue.GetMessagesAsync(32, TimeSpan.FromMinutes(5), null, null);
Console.WriteLine("After getting messages");
My code will sit on the await call, then after around 15 minutes will get -
Microsoft.Azure.Storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
...
ErrorCode:AuthenticationFailed
After this exception happens, all is OK and the reader starts processing the messages again.
Also, if I kill the service when it is stopped on the await call and start it back up (before 15 minutes has passed) it processes normally.
This happens randomly, it could work fine for a day and happen, then happen 1 hour later, then be fine for a day.
Further to the above, I've changed my code to -
var messages = await _pingsQueue.GetMessagesAsync(
32,
TimeSpan.FromMinutes(5),
new QueueRequestOptions() {
RetryPolicy = new Microsoft.Azure.Storage.RetryPolicies.NoRetry(),
MaximumExecutionTime = TimeSpan.FromSeconds(60)
},
null);
I don't get the authentication error anymore, I get -
Microsoft.Azure.Storage.StorageException: The client could not finish the operation within specified timeout.
Generally the call to GetMessagesAsync takes around 100ms, so not sure what is causing it to execute for 60 seconds before timeout.
We are seeing similar issues in GetMessage* APIs, and also ListBlobs APIs. We are also using .NET Core 2.2, but we are using WindowsAzure.Storage v9.3.3.
Memory dump callstacks are included in the linked issue: https://github.com/Azure/azure-functions-durable-extension/issues/961
CloudQueue.GetMessage()
works at times then blocks and doesn't return. Single threaded program.
To diagnose further we created a simple console program that does the following:
AddMesssage()
GetMessage()
followed by DeleteMessage()
. Repeat this until queue is emptyHere are our findings:
n = 1000
was more than sufficient to reproduce this in no time.n
up to 10,000.Recommendation: If you encounter blocking calls with azure storage such as queues or blobs - e.g. GetMessage()
, AddMessage()
, DeleteMessage()
, etc. - then check if you are running the old Azure Storage Emulator. If that is the case, you can try using the new open-source Azurite emulator.
References:
Related keywords (to make it easier for search engines): hangs - blocking call - deadlock
We are also seeing Azure Storage Queue calls hang using Microsoft.Azure.Storage.Queue v11.1.2. We're seeing it in an 8 year old Worker service that had been rock solid, until we migrated to this nuget from the older WindowsAzure.Storage v8.6 nuget. Now we see periodic hangs on simple methods like GetMessages() or DeleteMessage().
We only see this in our production systems that are processing high volumes of messages (approx 1 million per day), but the hang seems random. It can process a couple million messages before hanging, then I restart my container and it hangs after just tens of thousands, then back to running for a million.
This seems like a pretty fundamental bug that ought to be a very high priority, yet this dates back a full 8 months with no resolution. What gives?
After some experimentation we have found that the problem goes away when using the Async version of queue methods that accept a CancellationToken. We had been using the synchronous calls for years and in this nuget those sync calls we causing issues, while the async (only the calls that accept the cancellation tokens) do not exhibit the problem.
I still feel like this is a hugely destabilizing bug that ought to be a high priority. Anyone walking up to Azure and using this nuget will likely walk away from Azure thinking the storage system is unreliable.
This has been addressed in 11.1.4 .
Which service(blob, file, queue, table) does this issue concern?
queue
Which version of the SDK was used?
Microsoft.Azure.Storage.Queue 10.0.3, but issue was still there in 9.4.2
Which platform are you using? (ex: .NET Core 2.1)
.NET Core 2.2
What problem was encountered?
We have a code where multiple role instances read simultaneously from the same queue.
Each role instance create a single queue instance and starts to read from queue in a single thread, similar to the following code:
How can we reproduce the problem in the simplest way?
Try to run above code with multiple role instances (~100 instances). Some role instances will suddenly go into deadlock, stuck after saying "Before get message", but never making a progress.
Same issue happens in all GetMessage* API flavors (GetMessages, GetMessageAsync, GetMessagesAsync).
Have you found a mitigation/solution?
We found that the deadlock does not occur in WindowsAzure.Storage nuget, so this is a regression in the code with moving to Microsoft.Azure.Storage.Queue.