Open Jokelab opened 1 month ago
hi @Jokelab thanks for raising this. We'll try and reproduce the issue and then post an update here.
@Jokelab I followed your repro steps but I couldn't reproduce the problem. After the upgrade to 5.2.0, messages were ingested from the audit queue and forwarded to the audit log queue, as expected.
FYI I made each message approximately 2MB in size.
Is there anything missing from your repro steps?
@adamralph Thank you very much for taking the effort in trying to reproduce the issue. I must be overlooking something. I tried to reproduce the issue myself today with a fresh Azure ServiceBus namespace and a new Function App. The issue didn't occur this time.
Therefore I tried to update our actual instance again, hoping it was just a fluke, but now the issue re-appears :( . I noticed that not all messages fail to be forwarded, just a few of them, but I cannot find anything specific about the failing messages when I investigate them in the RavenDb database. They have different types and sizes. I will try to find the differences between my repro steps and the actual situation a bit more and come back here once I figure it out.
So I investigated the issue a little more and have found a way to consequently reproduce it. It turns out that it's not related to the update, but also occurs with a fresh install of ServiceControl5.2.0.
To reproduce the issue, I slightly modified the ASBWorker sample code for Azure Functions (isolated) on your website. The code is attached to this comment. The HttpSender function now has the following parameters:
This works fine: /api/HttpSender?followup=10&sizekb=100&immediate=false
It also works fine when sent immediately: /api/HttpSender?followup=10&sizekb=100&immediate=true
But when I increase the message size to 110Kb, I get the error message: /api/HttpSender?followup=10&sizekb=110&immediate=true
I hope this helps to investigate the issue. Best regards!
@Jokelab thanks for the detailed repro steps. I've managed to reproduce this now in the SC Audit instance. I'll continue to investigate and I'll comment back here when I have more info.
BTW, this alternative repro didn't work for me:
But when I increase the message size to 110Kb, I get the error message: /api/HttpSender?followup=10&sizekb=110&immediate=true
The messages were both sent and received without any errors.
Thank you for looking into this!
BTW, this alternative repro didn't work for me:
But when I increase the message size to 110Kb, I get the error message: /api/HttpSender?followup=10&sizekb=110&immediate=true
The messages were both sent and received without any errors.
I think this depends on the batch size. For me 110 was enough to trigger to exception. Maybe increasing this to a larger number (500) triggers it for you as well.
@Jokelab I found the problem with the alternate repro. It has to be set to immediate=false
, not immediate=true
.
I.e. this reproduces the problem:
/api/HttpSender?followup=10&sizekb=110&immediate=false
@Jokelab we've found the source of the problem.
By default, outgoing messages from a message handler are sent in batches, after the handler exits (that can be suppressed with SendOptions.RequireImmediateDispatch
as your repro code demonstrates).
We've discovered that, if the size of the entire batch is more than the allowed size per message (which we are setting to 102,400 KB in this example), the Azure SDK client throws this rather misleading exception.
This manifests as a bug in our Azure Service Bus Transport, which in turn manifests as a bug in ServiceControl, because the transport is used internally in ServiceControl.
We will follow up with the Azure SDK team to see what can be done about this and get back to you.
I'm not sure if there's another workaround available in the meantime other than sticking with an older version of ServiceControl, but if we do identify one then we'll let you know.
@adamralph Thank you for clarifying! Is it safe to say that the issue that we can observe on the endpoint side (when triggering /api/HttpSender?followup=10&sizekb=110&immediate=false
) is the same as on the ServiceControl side?
For this reason we sometimes use immediate dispatch instead of the preferred method (in batch), with all the risk that entails.
For now we'll stick on our current version (5.0.4).
Is it safe to say that the issue that we can observe on the endpoint side (when triggering /api/HttpSender?followup=10&sizekb=110&immediate=false) is the same as on the ServiceControl side?
@Jokelab yes, it's exactly the same problem.
@Jokelab FYI I've raised a corresponding bug for the transport — https://github.com/Particular/NServiceBus.Transport.AzureServiceBus/issues/994
@Jokelab please follow both this issue and https://github.com/Particular/NServiceBus.Transport.AzureServiceBus/issues/994 for updates on the status of the bug fixes.
If at any point, the bugs fixes become critical for you, please request critical support from https://particular.net/support
@Jokelab FYI we've prioritized this near the top of our list and will work on it as soon as we can.
Describe the bug
This bug is caused by https://github.com/Particular/NServiceBus.Transport.AzureServiceBus/issues/994.
Description
After upgrading from ServiceControl 5.0.4 to 5.2.0, audit message ingestion fails and messages are not forwarded to the configured audit log queue.
Expected behavior
Audit messages are ingested and forwarded to the configured audit log queue.
Actual behavior
Versions
ServiceControl 5.2.0 Transport: Azure ServiceBus (premium) I explicitly mention this because our system sometimes needs to process messages slightly bigger than 1MB.
Our endpoints are Azure Functions using the following versions: NServiceBus 9.0.0 NServiceBus.Transport.AzureServiceBus 4.0.0
NServiceBus.AzureFunctions.Worker.ServiceBus 5.0.0
Steps to reproduce
Expected: messages are ingested from the audit queue and forwarded to the audit log queue Actual: messages stay in the audit queue and are not forwarded to the audit log queue.
Relevant log output
Additional Information
Workarounds
Reinstalling everything with ServiceControl version to 5.0.4 resolved the issue.
Additional information
I tried re-importing the messages as suggested in the documentation https://docs.particular.net/servicecontrol/import-failed-messages, but this keeps triggering the same exception. To make sure it is related to this version and that the issue doesn't occur in 5.0.4, I installed 5.0.4. again and tested that everything is ok. Then upgraded to 5.2.0 again and the issue appears.