Azure / azure-functions-servicebus-extension

Service Bus extension for Azure Functions
MIT License
65 stars 35 forks source link

ServiceBusTrigger Fires Again Immediately After Exception in User Code #161

Closed phil000 closed 2 years ago

phil000 commented 3 years ago

We are using ServiceBusTrigger in Azure Functions.

"FUNCTIONS_WORKER_RUNTIME": "dotnet-isolated"

Microsoft.Azure.Functions.Worker.Extensions.ServiceBus 4.2.1 Microsoft.Azure.Functions.Worker.Sdk 1.0.2 Microsoft.Azure.Functions.Worker 1.1.0

When an exception occurs in user code the same message is tried again without any delay. I would have expected that the lock timeout would have been allowed to expire first. Otherwise we get {max dequeue count} errors occurring within about 5 seconds and the message moves to the dead letter queue.

phil000 commented 3 years ago

Just some debug.writeline to show message id, dequeue count, and time it is being processed after an exception occurs and is thrown out of user code:

13/05/2021 4:04:03 pm - Processing Message Id: 679b17da-6d1b-405f-9262-ec8535cb22ea, Dequeue Count: 1 13/05/2021 4:04:05 pm - Exception in user code SqlException. Will be thrown out to runtime.

13/05/2021 4:04:06 pm - Processing Message Id: 679b17da-6d1b-405f-9262-ec8535cb22ea, Dequeue Count: 2 13/05/2021 4:04:06 pm - Exception in user code SqlException. Will be thrown out to runtime.

13/05/2021 4:04:06 pm - Processing Message Id: 679b17da-6d1b-405f-9262-ec8535cb22ea, Dequeue Count: 3 13/05/2021 4:04:06 pm - Exception in user code SqlException. Will be thrown out to runtime.

13/05/2021 4:04:07 pm - Processing Message Id: 679b17da-6d1b-405f-9262-ec8535cb22ea, Dequeue Count: 4 13/05/2021 4:04:07 pm - Exception in user code SqlException. Will be thrown out to runtime.

13/05/2021 4:04:07 pm - Processing Message Id: 679b17da-6d1b-405f-9262-ec8535cb22ea, Dequeue Count: 5 13/05/2021 4:04:08 pm - Exception in user code SqlException. Will be thrown out to runtime.

13/05/2021 4:04:08 pm - Processing Message Id: 679b17da-6d1b-405f-9262-ec8535cb22ea, Dequeue Count: 6 13/05/2021 4:04:08 pm - Exception in user code SqlException. Will be thrown out to runtime.

13/05/2021 4:04:08 pm - Processing Message Id: 679b17da-6d1b-405f-9262-ec8535cb22ea, Dequeue Count: 7 13/05/2021 4:04:08 pm - Exception in user code SqlException. Will be thrown out to runtime.

phil000 commented 3 years ago

Any idea how we can get the reties after an exception to occur AFTER the message lock timeout we have configured has expired, to give us time to solve any issue? Really no point having all 12 dequeues of the message happing in 5 seconds, we already use transient fault handling so the dequeues need to be far enough apart to deal with longer more persistent problems. With Azure Storage Queues this all works properly. After an exception you'll see the message again after the visibility timeout expires, so in our case 30 minutes later.

phil000 commented 3 years ago

29 days?

austinkispotta-gep commented 3 years ago

I have the same issue. Is there any inbuild solution to this? Or any alternatives?

NoPanicBanick commented 2 years ago

Hopefully I understand your issue correctly. It sounds like the retry is happening too quickly and you would like a delay between retries. There is some built in support for this using the host.json in your Azure Function. Here is an example of what that might look like:

{
  "version": "2.0",
  "retry": {
      "strategy": "fixedDelay",
      "maxRetryCount": 10,
      "delayInterval": "00:00:30"
  }
}

Checkout the Microsoft Docs on this for more info: https://docs.microsoft.com/en-us/azure/azure-functions/functions-host-json#retry

phil000 commented 2 years ago

Thanks. We have already set up these parameters on the service bus subscription through maximum dequeue count and visibility timeout etc. We would actually expect these settings to be honoured. And with the storage queue trigger we use elsewhere it does appear to work this way. If an exception occurs and propagates out of the function we get the message back again but only after the visibility timeout expires (30 minutes). With service bus trigger we get the message back again immediately - like less than a second later. This seems like a bug. The message is received again from the underlying source with a incremented dequeue count without any delay, totally defeating the purpose of retrying beyond very transient issues. Storage queue triggers work 'as expected' one way when unhandled failures occur, but service bus trigger doesn't.

We could use 'retry', but how does 'retry' combine with retry settings built into triggers like queue and service bus? If we set 'retry count' to 5 but the subscription dequeue count is 10 do we get 5 x 10 = 50 attempts to process the messages?? We don't want that. We want what we've configured in the subscription to be honoured. Don't 'complete' message processing if an unhandled exception occurs, or at least make it configurable.

Bottom line, if we were 'coding this up ourselves' in the event of an unhandled exception o\we wouldn't 'complete' the message processing and the message would appear again 5 minutes later, as expected.

phil000 commented 2 years ago

@Bubba95x I've had a look at a few things and it's not looking good.

The FixedDelayRetry attribute applied directly to service bus triggered functions does nothing. Seems completely ignored both in terms of retry count and delay.

When using the 'retry' settings in host.json the retry count is used for 'local' retries, but the delay is completely ignored. Aka, with a maxRetryCount of 1 and delay of 10 seconds for example, you get this pattern:

  1. message delivered with service bus 'deliveryCount' = 1
  2. message delivered again without any delay and with service bus 'deliveryCount' = 1
  3. message delivered again without any delay and with service bus 'deliveryCount' = 2
  4. message delivered again without any delay and with service bus 'deliveryCount' = 2

e.g 29/08/2021 10:59:45 am - Message received. DeliveryCount: 1, MessageId: ba33a8e9-cf4b-4a51-b7e7-adf343ecf0b5 29/08/2021 10:59:46 am - Message received. DeliveryCount: 1, MessageId: ba33a8e9-cf4b-4a51-b7e7-adf343ecf0b5 29/08/2021 10:59:47 am - Message received. DeliveryCount: 2, MessageId: ba33a8e9-cf4b-4a51-b7e7-adf343ecf0b5 29/08/2021 10:59:48 am - Message received. DeliveryCount: 2, MessageId: ba33a8e9-cf4b-4a51-b7e7-adf343ecf0b5

phil000 commented 2 years ago

A few things have been updated in the last 3 months, still no improvement.

Microsoft.Azure.Functions.Worker.Extensions.ServiceBus 4.2.1 Microsoft.Azure.Functions.Worker.Sdk 1.0.4 Microsoft.Azure.Functions.Worker 1.5.1