Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.23k stars 4.58k forks source link

[BUG] Errant use of TransactionScope causes messages to be silently dropped #44993

Closed ericleigh007 closed 1 week ago

ericleigh007 commented 1 month ago

Library name and version

Azure.Messaging.ServiceBus 7.14.4

Describe the bug

A user program making use of Service bus through an abstract interface used TransactionScope for its internal purposes, not knowing that Service Bus also honors it. Messages were dropped intermittently that were meant to be sent, with no indication back to the user program.

Only if

   using (AzureEventSourceListener listener = AzureEventSourceListener.CreateConsoleLogger(EventLevel.Verbose))
   {
    //  o o o user code here

is added,

Is the warning message issued:

[Warning] Azure-Messaging-ServiceBus: RunOperation encountered an exception and will retry. Exception: Azure.Messaging.ServiceBus.ServiceBusException: Transaction hasn't been declared yet, or has already been discharged (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.<SendBatchInternalAsync>d__23.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.<SendBatchInternalAsync>d__23.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.<>c.<<SendAsync>b__24_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.<>c__22`1.<<RunOperation>b__22_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.<RunOperation>d__23`2.MoveNext()

It would seem that such an error would be raised as an exception. Also, don't believe there is any retry, so I "expect" this kind of error was added to the wrong exception category.

Expected behavior

I would expect such a non-transient error to be signalled to the user code so it may be retried. By definition, a transaction cannot be retried by the SDK.

Actual behavior

The warning message above is issued, but UNLESS the user turns on all the diagnostic tracing, the sent messages seem to be sent fine, but just go missing.

Reproduction Steps

For this problem, I don't believe a code example is necessary.

Environment

This is a Framework 4.7.2 project running on Windows in a VM in Azure running Windows 10, but not sure this matters.

github-actions[bot] commented 1 month ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

jsquire commented 1 month ago

Hi @ericleigh007. Thanks for reaching out and we regret that you're experiencing difficulties. The log that you're capturing indicates a transient failure that was retried. If no exception is raised, then the retry succeeded. If all retries were exhausted and the operation was not successful, the exception is surfaced to callers.

With respect to the transaction behavior, I'm not sure that I understand the end-to-end scenario and there's not enough context to attempt a repro. Can you help us better understand the scenario and provide either clear repro steps or a small, stand-alone app that reproduces the behavior that you're inquiring about?

github-actions[bot] commented 1 month ago

Hi @ericleigh007. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

ericleigh007 commented 1 month ago

No repro ready yet... 24hr a day job.

But I can refute what was said. In this case, the outgoing message was silently dropped. No exception was raised or I would not have created the issue.

The problem is a silent failure (unless you have diagnostic messages turned on) where the SendAsync fails with that error, and the user knows nothing, other than SEQ numbers are skipped, and the message isn't sent.

jsquire commented 1 month ago

@ericleigh007: That would imply that the send was successful - the client received an ACK from the service and then the transaction was rolled back. Does that line up with your scenario?

github-actions[bot] commented 1 month ago

Hi @ericleigh007. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

ericleigh007 commented 1 month ago

Just wondering what you meant by "ACK" that the client received. The send was successful, as far as could be understood in our code, but the seq number is skipped, and there is no message on the queue.

Also, it bears mentioning that the problem is not solid. It is intermittent, because sometimes the send succeeds, the seq number is NOT skipped, and the message reaches the receiver.

To prevent any weird problems with the receiver, the test I'm running here, though, is with no receiver, so we know that the message should remain on the queue after it was sent.

I was busy again this evening, but will try to get the time to work on a small replication case. Stay tuned. ;)

jsquire commented 1 month ago

@ericleigh007: When the client sends a message, the operation does not complete until the service acknowledges receipt of the message and takes responsibility for it. If the client does not receive this acknowledgement (ACK) from the service, an exception is surfaced to callers. The lack of an exception seen by your application indicates that the client's "send" call was successful. The service received the message and acknowledged that it has taken responsibility.

The question that we need to figure out is whether or not the application rolled back a transaction, signaling to the service intent to undo that send or whether this may be a service issue.

github-actions[bot] commented 1 month ago

Hi @ericleigh007. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] commented 1 month ago

Hi @ericleigh007, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

ericleigh007 commented 1 month ago

Hello again. The caller of my package that uses the sdk to send and receive messages, as explained above, is getting interfered with by a TransactionScope created outside of the package, as the user is using an SQL database.

This appears to work, in order to have the service bus layer opt-out of the transaction.

using (var ts = new TransactionScope(TransactionScopeOption.Suppress, TransactionScopeAsyncFlowOption.Enabled))
{
    await _sender.SendMessageAsync(...)
}

Do we see any problem with this?

jsquire commented 4 weeks ago

@ericleigh007: Ah, I think that connects the dots for me. Your caller is using a transaction explicitly for database operations and did not expect that Service Bus would take over the flow. Any rollback was removing the message from Service Bus, as it was intended to do, but since the transaction was intended for a different scope, your caller found this to be unexpected behavior.

Because Service Bus does not participate in distributed coordination, you would normally see an error when SQL Server and Service Bus fall into the same transaction scope unless, for some reason, SQL Server did not elevate this to a DTC flow. I think what you're proposing above is safe and should opt Service Bus out of the transaction.

@JoshLove-msft - I'd appreciate your thoughts here to make sure that I'm not overlooking something.

ericleigh007 commented 4 weeks ago

Okay, I'll report back here on how it goes. We should know tomorrow.

jsquire commented 2 weeks ago

It seems that things are working with the approach from the discussion thread above. I'm going to mark this as resolved. Please unresolve if further assistance is needed.

github-actions[bot] commented 2 weeks ago

Hi @ericleigh007. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

github-actions[bot] commented 1 week ago

Hi @ericleigh007, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.