Closed Floopy-Doo closed 1 year ago
Hi @Floopy-Doo. According to the documentation (and I admit I haven't verified yet myself), messages that are moved to the bridge error queue should have the FailedQ header set to allow scripted retries. Is this not happening in your system?
Yes, we are missing this header and therefore maybe more. We are using the bridge to close the gap between MSMQ and ASB.
Packages: NServiceBus 8.0.3 NServiceBus.Transport.AzureServiceBus 3.1.2 NServiceBus.Transport.Bridge 1.0.1 NServiceBus.Transport.Msmq 2.0.0
Header we have on the messages: (Message from ASB side) "NServiceBus.Transport.Encoding" "NServiceBus.MessageId" "NServiceBus.MessageIntent" "NServiceBus.ConversationId" "NServiceBus.CorrelationId" "NServiceBus.OriginatingMachine" "NServiceBus.OriginatingEndpoint" "$.diagnostics.originating.hostid" "NServiceBus.ContentType" "NServiceBus.EnclosedMessageTypes" "NServiceBus.Version" "NServiceBus.TimeSent" "NServiceBus.Transport.Bridge.ExceptionInfo.ExceptionType" "NServiceBus.ExceptionInfo.InnerExceptionType" "NServiceBus.Transport.Bridge.ExceptionInfo.HelpLink" "NServiceBus.Transport.Bridge.ExceptionInfo.Message" "NServiceBus.Transport.Bridge.ExceptionInfo.Source" "NServiceBus.Transport.Bridge.ExceptionInfo.StackTrace" "NServiceBus.Transport.Bridge.TimeOfFailure"
According to nuget we are up to date (except Nsb.T.Asb by a patch).
Do we need to configure something except the ErrorQueue?
If a message is moved to the error
queue, the endpoint on which the error occured will add its own queue to the FailedQ
header. As a result, once the error is solved, the message can be send back to that queue, so it can be retried by the endpoint. So, for example, if Sales
receives the message in the sales
queue and NServiceBus decides to send it to the error
queue, it will add sales
to the FailedQ
header.
It's possible that Sales
is on Azure Service Bus (ASB) and the error
queue is on RabbitMQ, for example. The bridge picks up the message from ASB and shovels it to RabbitMQ and puts it in the error
queue on RabbitMQ. You can see in this line of code the FailedQ
header is even transformed on messages that are send to the error queue.
Imagine the Bridge fails at it and moves the message to the bridge.error
queue. If it overwrites the FailedQ
header, the message can never be send back and retried.
Now, if I recall correctly, messages that end up in the bridge.error
queue should always be transferred back to the queue where the Bridge picked it up. And that's what your problem currently seems to be, as there's no automated way to figure out which queue that is?
@dvdstelt Yes that's exactly the problem. We could parse the "NServiceBus.Transport.Bridge.ExceptionInfo.Message" because the content provides a mentioning of the queue which it failed from, but that seems a bit flimsy to relay on the error message not changing and text parsing in general.
I would envision to have a additional header like "NServiceBus.Transport.Bridge.FailedQ" that would not override the normal NServiceBus.FailedQ header.
This sounds like a good addition. We do regular enhancements to the platform and will make sure this is added to the pool. I can't say for sure when it'll be addressed but don't be afraid to comment here to ask for updates. Some of us really enjoy seeing the words "Floopy-Doo" come through our queue.
This feature becomes more critical when handling messages put on bridge.error
due to the service crashing. The exception message does not contain the queue in these cases, just an IServiceProvider
error. We're left having to manually figure out where each message is supposed to be.
Any update on this issue? Or any idea when this will be addressed?
@MarcWils Still no set date but by asking about it, you've implicitly bumped the priority when we look for ways to improve the platform.
@Floopy-Doo @MarcWils @ThreePinkApples Could you share the reasons for messages to be forwarded to the error queue? This information can help us improve the messaging bridge.
@ramonsmits We had so far the following reasons:
We are using MSMQ and AzureServiceBus.
@ramonsmits For us it's mainly due to our "BridgeService" crashing during startup and the message that are currently being processed is then dumped into the error queue. This can also happen during a regular restart/redeploy of the service where we do not have any crashes. Configuration issues do happen but are mostly prevented through unit tests.
The crashes are mostly caused by connection timeouts towards RabbitMQ, and we haven't found any solution to that issue. We have 39 MSMQ endpoints and 25 RabbitMQ endpoints being bridged and almost every time we redeploy/restart our "BridgeService" it needs several attempts before it is able to set up all RabbitMQ connections without any of the connections timing out. We experimented with increasing RabbitMQ resources and timeout settings but it hasn't helped us, and we've shifted our focus to moving more and more over to the RabbitMQ side so we can reduce the number of endpoints being bridged.
The crashes are mostly caused by connection timeouts towards RabbitMQ, and we haven't found any solution to that issue.
Do you have the same timeout issues with regular endpoints? Not just the Bridge?
@ThreePinkApples @Floopy-Doo If you have any details when/why the bridge fails at (re)start that would be helpful.
If you have many endpoints that need to be bridged you could consider hosting multiple bridges with a reduced number of endpoints.
@ThreePinkApples @Floopy-Doo A shutdown of the messages bridge should not result in any messages being moved to the error queue.
The crashes are mostly caused by connection timeouts towards RabbitMQ, and we haven't found any solution to that issue.
Do you have the same timeout issues with regular endpoints? Not just the Bridge?
No, but I guess the Bridge is unique since it is one service setting up so many connections at the same time
@ThreePinkApples @Floopy-Doo If you have any details when/why the bridge fails at (re)start that would be helpful.
If you have many endpoints that need to be bridged you could consider hosting multiple bridges with a reduced number of endpoints.
We did look into splitting the service up into smaller chunks, the problem is that there is so much "crosstalk" between services that there were few endpoints that could be isolated into its own Bridge service. We have a few cases where there is one MSMQ service that only communicates with RabbitMQ services, so we're working on moving these over to RabbitMQ first, that should at least significantly reduce the number of messages that end up on the error queue
@ThreePinkApples @Floopy-Doo A shutdown of the messages bridge should not result in any messages being moved to the error queue.
We get the following error:
System.Exception: Failed to shovel message for endpoint somequeue with id 7b13bb61-fc3b-4c09-b7fb-6ff7e2e4b7f3\452901742 from msmq to azureservicebus ---> System.ObjectDisposedException: ServiceBusSender has already been closed and cannot perform the requested operation.
Object name: 'ServiceBusSender'.
at Azure.Core.Argument.AssertNotDisposed(Boolean wasDisposed, String targetName)
at Azure.Messaging.ServiceBus.ServiceBusSender.<CreateMessageBatchAsync>d__30.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at NServiceBus.Transport.AzureServiceBus.MessageDispatcher.<DispatchBatchForDestination>d__7.MoveNext() in /_/src/Transport/Sending/MessageDispatcher.cs:line 143
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at NServiceBus.Transport.AzureServiceBus.MessageDispatcher.<Dispatch>d__5.MoveNext() in /_/src/Transport/Sending/MessageDispatcher.cs:line 97
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at MessageShovel.<TransferMessage>d__1.MoveNext() in /_/src/NServiceBus.Transport.Bridge/MessageShovel.cs:line 65
--- End of inner exception stack trace ---
at MessageShovel.<TransferMessage>d__1.MoveNext() in /_/src/NServiceBus.Transport.Bridge/MessageShovel.cs:line 73
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at NServiceBus.Transport.Msmq.ReceiveStrategy.<TryProcessMessage>d__7.MoveNext() in /_/src/NServiceBus.Transport.Msmq/ReceiveStrategy.cs:line 112
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at NServiceBus.Transport.Msmq.ReceiveOnlyNativeTransactionStrategy.<ProcessMessage>d__2.MoveNext() in /_/src/NServiceBus.Transport.Msmq/ReceiveOnlyNativeTransactionStrategy.cs:line 93
We get the following error:
@Floopy-Doo Does that happen immediately when shutting down? I would not expect code to throw an ObjectDisposedException
during a regular graceful shutdown. Only after the endpoint shutdown will timeout such exceptions could occur because the code would then be in an undefined state.
By the way, this could just be an issue with https://github.com/Particular/NServiceBus.Transport.AzureServiceBus and not the bridge.
Do you also see this for regular endpoints in your system?
@ramonsmits
We get the following error:
@Floopy-Doo Does that happen immediately when shutting down? I would not expect code to throw an
ObjectDisposedException
during a regular graceful shutdown. Only after the endpoint shutdown will timeout such exceptions could occur because the code would then be in an undefined state.By the way, this could just be an issue with https://github.com/Particular/NServiceBus.Transport.AzureServiceBus and not the bridge.
We have this issue when we stop the service to do maintenance work (so a graceful shutdown) while its queues are not completely empty (that a majority of the time).
Do you also see this for regular endpoints in your system?
We haven't had that issues with any other endpoints as of now, just the bridge. But we have only a small portion of our services migrated to ASB.
The newly added header will be called NServiceBus.MessagingBridge.FailedQ
and released with version 2.0.1
soon.
@Floopy-Doo @MarcWils @ThreePinkApples We just released version 2.0.1 that adds a header to the queue. Documentation can be found here. We'll announce it shortly on https://discuss.particular.net/ Let us know if this works better for you, even though it's not optimal as we don't provide the tools to move messages back. But it's on our list of things to do, although we might have some discussions what the best way is. For example, if people move from TransportA to TransportB and ServiceControl is running on TransportB, there's no way to use ServicePulse to retry those messages. So we might need something differently, but we're not sure what it'd be. Yet.
We are currently working on a system to recover messages from the error queue of a bridge transport and retry them, when the error is fixed. Currently there is no header indicating in which queue the original message resided before it was moved to the bridge error queue. (The normal error queue of NSB normally adds a FailedQ header to the message).
Our feature request would be to add a FailedQ (or similar) header to messages put into the transport error queue.
Pro
Con