Open jsquire opened 3 years ago
This is such a critical, overdue feature. Having 'delivery counts' is useless without this, because anytime something fails, it just retries N times in rapid succession and deadletters anyway. This extra processing just makes bad situation N times worse. We need to be able to 'update' the message properties AND (more importantly) reschedule the original message to run with exponential backup or whatever algorithm we want. We can control this by storing the original message time in the user properties collection for example, and computing next delay using the current delivery count. Using transactions to reenqueue a new message while completing the existing one is not a good option. Should not have to resend the entire message payload. I would recommend just updating the AbandonAsync method to include overloads that accepts an updated scheduled enqueue time in addition to the updated user properties.
A scheduled message really can't 'be in line' when it's scheduled. It's just at a theoretical point in time. When that point in time elapses, the message should just 'get in line' at that point in time (end of the line). The 'delay' is the more important functionality, not the specific time. Queued messages are queued and are delayed by nature.
+1 for this please. Not much use in DDOS'ing our own services. Exponential back off policy would be a fantastic feature to add.
One way to accomplish this already today is to use message deferral combined with a scheduled message. For this, you would defer the message, and place it's sequence number in a scheduled message. When the scheduled message comes in, use the sequence number to retrieve the deferred message and process it. Please let us know if this works for you.
@EldertGrootenboer I'm interpreting the original as asking for a built-in functionality to reduce the code complexity required for something that should be a simple message disposition. Whenever a workaround that involves several operations is involved, not only that incurs an additional cost on the service level, but also cognitive tax and complexity added to codebases. Walking through the workaround, here's what needs to happen:
To sum it up, there are scenarios where abandoning with a custom delay is necessary and workarounds cannot provide the same value a feature would. I hope this helps.
Thank you for your feedback! Although this is not something that should be done with either Abandon or Defer, as it would change the semantics of those actions, it is something we want to look into putting on the backlog. I would like to align with you for this, to get the details for your scenario. @Bnjmn83, @triynko and @SeanFeldman could you drop me a message on egrootenboer@microsoft.com, and we can take it from there.
@RichardGaoF, abandoning is never about deferring. With regular abandon operation the message goes back to the queue and is available right away. With this feature, the ask is for the message to be delayed for the provided time span upon abandoning, and then become available automatically.
One way to accomplish this already today is to use message deferral combined with a scheduled message.
For this, you would defer the message, and place it's sequence number in a scheduled message.
When the scheduled message comes in, use the sequence number to retrieve the deferred message and process it.
Please let us know if this works for you.
Thanks @EldertGrootenboer I have just one question that seems the deferred-time must be a fixed timespan set at scheduling the message? In other words, supposing setting the timespan as 10 minutes, does that mean The message will be enqueued in 10 minutes(scheduled) then also be deferred 10 minutes per retrieving and checking some custom conditions by the consumer OR The message will be enqueued in 10 minutes(scheduled) then the consumer will not retrieve the message UNTIL some custom conditions meet (works like an event trigger mode)?
I am expecting the latter, but looks it's actually the former (only be able to set a fixed timespan for a deferred message)?
@RichardGaoF You don't set the timespan on the deferred message, but on the scheduled message instead. The deferred message will stay on the queue until it is explicitly retrieved using it's sequence number.
The scheduled message will be enqueued after the timespan has elapsed, and will be placed at the back of the queue. Once it is picked up by a consumer, that consumer will then use the sequence number which was added to the scheduled message to retrieve the deferred message.
@RichardGaoF You don't set the timespan on the deferred message, but on the scheduled message instead. The deferred message will stay on the queue until it is explicitly retrieved using it's sequence number.
The scheduled message will be enqueued after the timespan has elapsed, and will be placed at the back of the queue. Once it is picked up by a consumer, that consumer will then use the sequence number which was added to the scheduled message to retrieve the deferred message.
Thank you @EldertGrootenboer . So if I implement it in the loops, the deferred message will be always inside(set aside) the queue during looping till be received, handled and completed, and each time of loop need a newly created and scheduled message with two properties. Its timespan parameter works like the loop interval and messageID should always be the sequence number of the deferred message. Correct?
@RichardGaoF you’re confusing message deferral and abandoning with a time-out. With this feature you don’t need to use message’s sequence number. The message won’t change its ID or anything else besides DeliveryCount because it will be the same message. Have a look at how abandoning works and add to that a back-off time that would be added. That’s it.
@SeanFeldman @EldertGrootenboer Thanks. Maybe I have known each concept of the peek-lock, abandon, lock expires, DLQ, TTL expire, scheduled message, deferred message ..., but there seems never an article on the Internet (including MSDN) being able to clearly describe all of them working together. Maybe there are some metaphors that they never work together, but if it does not say out, readers don't know or at least are not sure just like my current situation. Anyway, please allow me to try to describe the following typical scenario using all such concepts together.
We have just a "general" queue. There is a TTL timeout value of the queue self which means the message will be moved to DLQ if it has not been consumed after the TTL expires. At the peek-lock, a consumer polling requests then the queue locks and sends next message to the consumer. If the consumer cannot process this message and abandon it or the processing time exceeds the lock-timeout, queue unlocks this message to be re-visible to all consumers. Here is also a max delivery count, and the message will be moved to the DLQ too if exceeds the count.
There are an to-be-scheduled message and an to-be--deferred message, and the to-be-scheduled message's ID is the to-be--deferred message's sequence number. Schedule the to-be-scheduled message with a timespan and defer the to-be--deferred message.
The scheduled message will not be enqueued until arriving at the timespan.
A Consumer polling requests, then the queue locks and delivers the scheduled message to the consumer. The consumer uses the scheduled message ID(just the sequence number of the deferred message) to retrieve the deferred message and TRIES to process it.
[Here are the QUESTIONS](): If the deferred message has NOT been ready to be processed (or say 'failed to be processed'), the consumer will 1) directly create a new scheduled message; 2) schedule/enqueue the new message; 3) Defer a new copy of the deferred message; 4) Complete the original deferred message; 5) Complete the original scheduled message ? OR 1) abandon the deferred message to be visible in the queue again?
If the former (4-1), the deferred message will never exceed the max delivery count to be moved to the DLQ (actually, each deferred message will be delivered one time only). Else if the latter (4-2), once the abandon times exceed the max delivery count, the deferred message will be moved to the DLQ, but there will never be new scheduled messages and new copies of the deferred message.
[Which above one is the real behavior of the message deferral?]()
I personally prefer to the former, but not very sure because the MSDN doc locks more details and examples and this article with a example looks confused the scheduled message and deferred message.
@RichardGaoF, there's no deferral for this feature. Plain and simple. This issue is talking about the ability to abandon a message and specify a timeout. When a message is abandoned today, it goes back to the queue and is available for processing right away if there are no other messages in the queue. What this issue is about is adding a delay to an abandoned message, so that rather than appearing immediately, it would be delayed. It's the same message. No need to create a new message, no need to defer and hold on to a message sequence number, non of that.
The delivery count and dead-lettering would continue to work exactly the same way because the message is the same message.
If this still doesn't answer your question, I suggest moving a discussion to an email.
@SeanFeldman I re-read whole conversations thread to understand the context of the issue more.
Yes, delaying an abandoned message to be visible again in the queue is not provided by any Azure SB OOB feature now, so the method @EldertGrootenboer recommended (message scheduling + deferral) could be understood as a workaround when no existing OOB feature could be used directly now, but with a shortage that it's just a once operation/deferral instead of a "do-deferral-while" operation. So under this once operation/deferral, just like you said, the delivery count and DLQ work normally if we abandon the deferred message in our consumer.
On the other hand, just like my current business logic faced to, a typical business scenario is continually deferring a message until some condition(s) meet(do-deferral-while), instead of deferring a message once only. Therefore, some guys implemented such do-deferral-while behavior by loop creating new scheduled message and new deferred message to re-enqueue, for example, my found one from Internet
In short, referring to my last post, if 4-1, no abandon and exceeding max delivery count at all and just loop creating new scheduled message and new deferred message to re-enqueue to realize the do-deferral-while logic, else if 4-2, after a once message deferral by using a scheduled message and a message deferral, abandon's message will be re-visible in the queue immediately.
Thanks for invitation, and I might join your emails discussion if I meet more problems when implement my business logic.
One scenario where this request from @SeanFeldman would be useful is when you have sessions enabled and want to implement a circuit breaker on top. For example, I have multiple projects that send messages to a queue, session id is the customer id, and messages for the same customer need to be processed in order. But if there's a failure in processing one of the messages for a customer, requiring some manual intervention, a separate notification/workflow can be kicked off for manual investigation (say, a product is missing and needs to be created), and then reprocessing of the can continue. Being able to Abandon with a delay would be helpful so that specific session/customer 'pauses' processing while the issue is addressed, and the in-order requirement is not broken. It would at least make the solution to that requirement simpler I suspect.
A feature like @SeanFeldman proposes would definitely simplify many solutions. I would propose to additionally have a delay on deferral to let the message go back to normal queue after delay time. This way we don't need to keep track of sequence number in all cases. Of course there are some things to think about like TTL of the message when returned to the queue.
The reason to have both is that I would like to differentiate between an exception (e.g. a resource not available) and "I want to handle this later" (e.g. ordered processing). Abandon would raise delivery count while deferr would not.
We have put this on our backlog, thank you for everyone who gave their input. There are no implementation details or timelines to share yet, but we will update this thread as we progress.
It's been more than a year for any updates on this? Just checking if it is still part of the backlog?
This is indeed in the backlog, and we are currently creating a design for this. There are no timelines yet to share, but we will update this issue when we have more information.
So happy this is in the backlog and in design phase. Basically, when we fail to process a message, it's because of some transient error. Maybe a database is unavailable, or some async action it depends on having completed hasn't yet completed. So we abandon the message.
The problem is that it gets picked up right away, fails again, we abandon it, then we repeat this N times based on max delivery count. Because there's virtually no delay between retries, there's no time for the transient error condition to resolve itself, so we're basically DOSing our own system unnecessarily, and after N deliveries, the message deadletters anyway. All is lost.
All we want is ability to abandon a message and specify some delay (or scheduled future date) before it will be picked up by subscribed processors again, which we can compute ourselves as some exponential back-off based on the current delivery count. Semantically, introducing this delay in the Abandon call makes the most sense to us. We want to release the lock on the message, but we want a delay introduced before it gets picked up for processing again automatically.
The workaround of just rescheduling the message is a bad idea for a few reasons. A completely new message resets delivery count to zero. So we'd have to create/track our own delivery count. This also increases delivery size. We also need metadata for the retry, like 'which pieces of processing failed'. For example, we have 'subscriptions' attached to handlers (these subscribers represent downstream systems that need notified that the message has arrived), so if 2 of 3 subscribers fail to be notified about the message, we have to embed these failed subscriptions in the rescheduled message and retry processing. That's a problem because we risk increasing the original message size with this property, and risk failure to reschedule. Tracking delivery count on our own is also a bad idea, because if we fail to update our internal count and the lock times out, we lose a count. So we'd have to a SUM of the the Azure-managed DeliveryCount + our InternalDeliveryCount. So there 3 problems there.
Now, the workaround that EldertGrootenboer came up with to defer the original message and submit a scheduled message with just the sequence number of the original message solves most, but not all, of those problems:
It also crease a new problem. We now have these 'deferred' messages, which are harder to work with, plus these extra smaller scheduled messages, which artificially increases the message counts in our queues and messes with alarm thresholds. There's also risk with leaving a message deferred indefinitely if something goes wrong processing the scheduled message that holds the identifier. It's all just unnecessary complexity that wouldn't be necessary if this simple and obvious feature was implemented.
Of course ALL of this would be solved by the requested feature here. When we pick up a message and processing fails because of a transient error, we can call Abandon and just supply a delay so the message is scheduled in a way where it's not picked up by subscribed processors until after some delay, rather than immediately. (Note the use of the term 'subscribed processsors' here is different from the 'subscribers' I mentioned earlier; our 'subscribers' represent downstream systems that need notified about a message being processed).
I have been using some workarounds for this issue, and I just figured out there is an undesired side-effect: if one is using topics/subscriptions, then sending a new message to the topic when there is a failure in processing results in that all of the subscriptions will receive it which is quite unfortunate.
I really hope that this feature will be implemented soon as it is long overdue. Is there some estimate on when we can expect a possibility to delay the message processing without re-sending?
@ilya-scale I've done the same workaround and figured out the same side-effect. My workarount creates an "adicional" header in the new message with the name of the subscription that triggered the "deferral", so the other subcriptions look to this header and just ignore de message if it is not addressed to it.... something like wireless protocol.
This is indeed in the backlog, and we are currently creating a design for this. There are no timelines yet to share, but we will update this issue when we have more information.
How did the designing go? Did you run into any issues? Curious for an update!
Well put by @triynko :
The problem is that it gets picked up right away, fails again, we abandon it, then we repeat this N times based on max delivery count. Because there's virtually no delay between retries, there's no time for the transient error condition to resolve itself, so we're basically DOSing our own system unnecessarily, and after N deliveries, the message deadletters anyway. All is lost.
This is even more important with the Azure Function trigger. In case of failure, the same function is retriggered immediately.
Looking forward to a fix sooner for this :)
Well put by @triynko :
The problem is that it gets picked up right away, fails again, we abandon it, then we repeat this N times based on max delivery count. Because there's virtually no delay between retries, there's no time for the transient error condition to resolve itself, so we're basically DOSing our own system unnecessarily, and after N deliveries, the message deadletters anyway. All is lost.
This is even more important with the Azure Function trigger. In case of failure, the same function is retriggered immediately.
That's a problem we are facing right now, there are some possible workarounds like catching the exception and throwing it after a delay but that's far from ideal. Hopefully this will soon be included in the servicebus ;)
When using a scheduled message to act as a pointer to a deferred message, where are people sending the scheduled message? In our use case were dealing with topics, and sending a scheduled message back to the topic would impact all subscriptions (without using subscription filters).
So this leaves us having to set up a dedicated queue for scheduled messages acting as deferred message pointers. But then when you have multiple applications subscribing to a topic, each application needs its own queue to handle scheduling of deferred messages.
It would be great if this could be handed internally by Service Bus, maybe as an extension to Complete
& Abandon
, there could be a DeferUntil(TimeSpan)
?
This is indeed in the backlog, and we are currently creating a design for this. There are no timelines yet to share, but we will update this issue when we have more information.
We are facing the same challenge as was previously described by @nzthiago, having a service bus triggered function with sessions enabled where we need to delay (backoff/circuit-break) the repetitive execution of a whole sequence of messages in a single session when the server is busy or under maintenance.
As the feature request was backlogged almost a year ago, any update or timeline would be much appreciated. Thanks in advance.
Any Updates on this? Now nearly 2 years, that you got this request. It seems so easy to just add an delay when we abandon a message. We introduced service bus as a means of decoupling message transfers to not 100% available systems, so sometimes they just fail, and we want to delay it for some minutes or hours, when backends arent available. We see now, that the azure service bus was the wrong design choice, as he is not improving, even for such basic features, where one would use such a system. If nothing goes on, we may need to switch to other solutions. So we would need some help from your side, to avoid this.
So what is the conceptual challenge of providing a delay in a abandon message, that would take over a year to think over. can you please describe, where the current designprocess is hanging, what are the challenges you cannot solve yet. maybe community can help?
One way to accomplish this already today is to use message deferral combined with a scheduled message. For this, you would defer the message, and place it's sequence number in a scheduled message. When the scheduled message comes in, use the sequence number to retrieve the deferred message and process it. Please let us know if this works for you.
I think this is something that can be used to work-around with. However, it adds some additional administration and logic / plumbing to implement this.
The easiest way to implement this feature would indeed be an AbandonAsync
method overload that allows you to specify a scheduledEnqueueTimeUtc
value. I've just tried this, but this doesn't seem to work as the updated properties you can pass in to the AbandonAsync
method are seen as 'custom message properties' instead.
Providing a quick update on this. The design for this is ready, once this is picked up for development we will provide another update on this issue.
Really glad to hear this is being worked on. I started working with Service Bus for the first time this week and found myself in need of this feature almost immediately. Can't wait for it to come out!
Same here - really is a must-have if you're building with Functions
Same here. I am looking forward to this functionality to delay the execution of a service bus session's messages on a topic so that I don't have to implement workarounds when I have an outlying long-running process running from a message and I can maintain a max concurrent session/concurrent calls per session on a single machine.
this will make my life easier.
Thank you for your feedback on this item. We are currently doing active development on this feature, and expect to have more to share around its release in the next couple of months.
When is this feature going to be available? our business have multiple subscriptions and none of the strategy will work except this feature... I got the latest Azure.Messaging.ServiceBus nuget package 7.16.2 and still don't see this feature.
@kimberlyyong, the ETA was provided here .
oh i did not see this message this morning lol thank you! @SeanFeldman , bummer, this sucks so much.
On contrary, it's great that the feature is being worked on.
Why would you have an "abandon" method without specifying what time it should be retry again? this is unthoughtful design to begin with. All retry must think of a retry cadence strategy. Can you imagine Polly don't have this kind of retry policy? Also Abandon is a bad name for this method as well. Also, no way I can solve my own problem, no one can do any work around other than coming up with another service bus topic/queue to maintain their subscription retry, how ridiculous is this? Why message retry/visible time a property that user cannot modify? it's people defaulting to everything "get", instead of thinking what should be "get/set", another unthoughtful coding standard. Sorry I'm ranting but I expect better from this team.
Also it's been 2 years since an idea of a fix is suggested and who knows how long it's been a problem before that. Probably another 5 years.
@kimberlyyong While you are correct that the lack of an "abandon" feature seems like a significant oversight, it is also important to remember that a team of highly intelligent people worked hard on this and likely have a good reason for designing it the way they did. I think we can all benefit a lot from trying to understand the initial intent and working together to drive a better solution. I don't think berating a team on a public forum fosters a good culture or community for programmers. I would encourage everyone to be understanding and constructive with our comments; especially since each and every programmer out there has made similarly flawed design/implementation decisions at some point in their career.
Any updates on this ?
+1
+1
Waiting anxiously to rid ourselves from current custom solution which essentially generates new messages instead of actual re-delivery through abandoning.
@ultrabstrong so commenting on bad design is discouraged, how are we ever going to improve anything?
@kimberlyyong I don't think (nor did I say) commenting on bad design should be discouraged. Being disrespectful and berating people is not a good ingredient for making progress. There are a lot of ways to respectfully suggest improvement.
@ultrabstrong I did not think I was being disrespectful or berating. Maybe we grow up on a different culture, let's agree to strongly disagree.
As I said, I was just ranting and in my opinion not enough thought went in to design as I have specifically typed out the reasons why.
Do you know the product personally enough to tell that my comment of these design "unthoughtful" is untrue? I would love to know more about the behinds and details of these design decisions.
Also there are a lot of highly intelligent people and I expect more from Microsoft, (as I have typed out at the end of my comment) and everyone is highly intelligent in their own ways so calm down.
PS my partner told me the word "unthoughtful" could be view as a personal attack even though it was used on design/coding decisions. I will consider this in the future.
I think just forget the "AbandonMessage" method, replace it with "DeferMessage" with a timeSpan or date time parameters, if people want do whatever Abandon did, just call Defer with 0 timeSpan / now dateTime Or at a minimum let people change the visible time property when they Abandon the message (again this is really bad method name)
At the risk of being redundant with others, I wanted to add a comment expressing my desire for this feature, but will give my own context. I was very surprised to find out that the retry logic didn't work at all how I thought it would and that even when handling the message completion myself that there was no way to Abandon
the message without delay.
Our use case is to receive a message from our system that some data was added/updated/deleted and then keep an Azure AI Search index as in-sync as possible.
We already have our own retry logic built into handling transient failures with the AI Search Index so network/connectivity blips should be handled reasonably. The point of us trying to use Service Bus was to increase reliability in case connectivity (self-caused or otherwise) was lost between our "AI Search Index Updater" Azure Function and the AI Search service. If messages failed to be processed, they could be delayed and processed again later, only dead-lettering in the case of long outages, in which case we'd be alerted and could perform a re-drive.
As it stands, the message just retries immediately over and over until it hits its max delivery count and then bombs out to the deadletter queue, which somewhat negates our reason for wanting to use Service Bus to begin with. It's still better than just failing and losing the message on the initial API call but a deadletter redrive should be the exception, not the rule. I don't want Ops folks getting alerted in the middle of the night because our function has no ability to self-heal.
Now I need to reconsider whether we want to move forward with this approach and honestly its a pretty big let-down for me given how excited I was to use the feature to begin with.
Issue Transfer
This issue has been transferred from the Azure SDK for .NET repository, #9473.
Please be aware that @@Bnjmn83 is the author of the original issue and include them for any questions or replies.
Details
This is still a desired feature and totally makes sense in many scenarios. Is there any effort to implement this in the future?
Original Discussion
@msftbot[bot] commented on Tue Jan 14 2020
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl
@msftbot[bot] commented on Tue Jan 14 2020
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl
@jsquire commented on Tue Jan 14 2020
@nemakam and @binzywu: Woudl you be so kind as to offer your thoughts?
@nemakam commented on Tue Jan 14 2020
@Bnjmn83, This is a feature ask that we could work in the future, but we don't have an ETA right now. As an alternate solution, you can implement this yourself on the client using the transaction feature. Essentially, complete() the message and send a new message with appropriate "scheduleTime" within the same transaction. That should behave similarly.
@axisc commented on Thu Aug 13 2020
I think @nemakam's recommendation of completing the message and sending a scheduled message is a better approach.
Service Bus (or any message/command broker) is a server/sender side cursor. When a receiver/client wants to control when the message is visible again (i.e. custom delay/retry) it must take over the cursor from the sender. This can be achieved with the below options -
Do let me know if this approach is too cumbersome and we can revisit. If not, I can close this issue.
@mack0196 commented on Wed Mar 31 2021
If the subscription\queue has messages in there, will the scheduled message 'jump to the front of the line' at its scheduled time?
@ramya-rao-a commented on Mon Nov 01 2021
@shankarsama Please consider moving this issue to https://github.com/Azure/azure-service-bus/issues where you track feature requests for the service