Closed SeanFeldman closed 5 years ago
Unfortunately it’s difficult for me to provide exact numbers related to how many of our customers this impacts, but just from an anecdotal perspective, just yesterday we had support cases related to this exception from 3 4 separate customers.
Well, for transactions, you cannot really have a retry within the same transaction.
The only way out would be to use RetryPolicy.None
and have an uber retry outside of the transaction. Within a transaction its difficult to even define a retry. The first retry needs to fail.
The fix we will have to do in SDK is to throw an exception if they have a retry policy set within a transaction.
SBMP client did not have this issue. How did this work with SBMP client?
Looks like the old client didn't have this issue and was bubbling connectivity exception up right away (MessagingCommunicationException
). I've added a second project with the old client to show that.
SBMP client did not have this issue as the RetryPolicy in the older client never retried if there was an active transaction. That needs to be implemented in this client as well.
Explaining it with details for the benefit of everyone -
using (new txn)
{
// retryable operation.
}
In case of SendAsync()
, which is an easier point to talk about, lets say first send failed and the second send goes through. Lets say the first send failed with CommunicationException
. At this point we don’t know whether the first send completed or not. So what is expected in this case as a client? Should the second send even happen within the same transaction since the first one failed? Transaction rules dictate all-or-none. Do the two operations behave as independent operations or a single operation?
Now if we talk about CompleteAsync()
, it gets even more tricky. The first complete() went through all the way to broker, but again the client encountered CommunicationException
. So client doesn’t know if the first one actually reached broker or not. In this case, client cannot retry. Lets say it retries, first complete will not succeed till the transaction is committed/aborted. On the service the second complete() is “blocked” on the first complete. And the transaction on the client will not complete till the second complete succeeds. It’s a deadlock and the only way out is when after 2 minutes the transaction times out. It’s a pretty bad experience for customers since every retry will timeout for sure after 2 minutes.
The fix is simple – NO RETRIES during transaction case. The RetryPolicy should understand whether the operation is within a transaction scope or not. If it is, it should never retry. And this should be documented properly.
If you are stuck with this problem,
the immediate fix – make sure you don’t use any retry policy (i.e., use RetryPolicy.None
instead).
Or almost immediate fix – raise a PR here where a retry doesn’t happen if transaction is active.
I hope that makes sense.
When an ambient transaction exists, operations such as
SendAsync
enlist into existing ambient transaction.RetryPolicy
is taking that operation later and executes. When an operation is failing to execute due to connectivity problems,RetryPolicy
re-executes that operation, which includes repeated transaction enlistment. This in turn throws an exceptionSystem.InvalidOperationException: 'Local transactions are not supported with other resource managers/DTC.'
thrown byAmqpTransactionManager.EnlistAsync()
, completely masquerading connectivity problem.Stacktrace
at Microsoft.Azure.ServiceBus.Core.MessageSender.OnSendAsync(IList`1 messageList) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageSender.cs:line 562 at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 83 at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 105 at Microsoft.Azure.ServiceBus.Core.MessageSender.SendAsync(IList`1 messageList) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageSender.cs:line 252 at ReproApp.Program.Main(String[] args) in C:\Users\Sean\Desktop\RetryPolicyRepro\ReproApp\Program.cs:line 33 at ReproApp.Program.Expected Behavior
Steps to reproduce this issue:
Versions