Closed ramya-rao-a closed 2 years ago
Is time sync between clients and the service an issue? I don't think we're super reliant on it now (aside from doing auto-lock-renewal, but we don't delete or remove the message based on that expiring).
It kind of breaks our "let the service validate" model and seems a bit error prone if there's clock drift.
Another little thought - the entire point of prefetching is that you (as the caller) are so performant that you want credits to just be continually added, etc...
If they are getting messages served up that are expiring then they are failing there, and should consider tuning their setup. If we just filter these messages out they have no signal that's occurring.
The way it works in .NET is that messages are released if there is no receiver waiting for it when the message comes over the wire. This makes it unlikely for a customer to get served an expired message. This is specifically for the case where Prefetch is NOT enabled. If prefetch is enabled, then it is possible for customers to get expired messages.
Thanks @JoshLove-msft!
@shankarsama, If we can support the releasing of message, then that should help avoid delivery count problem. Do you know if the Java AMQP implementation supports this?
@richardpark-msft This issue is for the case where prefetch is not enabled.
@yvgopal just confirmed that the proton-j library does indeed support setting the message state as released
void disposition(DeliveryState state) is the same method that is used to settle messages and it can be used to set the state as released
I still wonder why we don't just want to use drain here. It's built for this exactly purpose.
As discussed in a recent Service Bus office hours, one issue with drain is that it can result in inefficiencies when dealing with concurrent receives. For instance, when using the processor in .NET, the default behavior is to have concurrent receive calls. If we drained credits when one receive call completes, we may end up stomping on other receive calls that are still in flight. We would also end up sending credit flows more often. As such, it may be more performant to leave the credits on the line when there could be concurrent receives. Releasing the messages also has the benefit of being very simple to implement as it is a standard AMQP disposition that we already have API support for.
A lot of good points @JoshLove-msft, as expected! :)
1 . Releasing the messages also has the benefit of being very simple to implement as it is a standard AMQP disposition that we already have API support for.
This is a good point. Drain is also a part of AMQP (it's a bit set on the flow frame, and is documented as part of the AMQP spec). Either way is equivalent over time, but drain has the benefit of avoiding excess transfers (even if you ultimately just Release() them).
I think this very much depends on how your client API is structured. Ultimately adding more credits on a single link isn't going to result in faster parallel receives, unless you're faster than the caller's dispatch/handling rate, similar to a prefetch scenario. So this is really more about how clients retrieve messages from your receiver client.
I think there are two styles here:
I don't know if either approach is better than the others, but I think the strategy you choose here does depend on what you consider to be idiomatic to your client.
I think this very much depends on how your client API is structured. Ultimately adding more credits on a single link isn't going to result in faster parallel receives, unless you're faster than the caller's dispatch/handling rate, similar to a prefetch scenario. So this is really more about how clients retrieve messages from your receiver client.
Well you would be avoiding superfluous credit transfers and stomping on existing receives, so it does seem like it should be more efficient to avoid this when you are in the concurrent receive model.
Imagine that receive call A is still in flight, and receive call B drains the link. I think in most libraries, you'd end up with receive call A still waiting for the full timeout despite the fact that there are no longer any credits on the link.
Well you would be avoiding superfluous credit transfers and stomping on existing receives, so it does seem like it should be more efficient to avoid this when you are in the concurrent receive model.
Imagine that receive call A is still in flight, and receive call B drains the link. I think in most libraries, you'd end up with receive call A still waiting for the full timeout despite the fact that there are no longer any credits on the link.
Yes, this is why I'm being very explicit - there are two different models for this API. If you want to handle locking/dispatching in the client and allow parallel ReceiveMessages() calls then drain after each receive completes doesn't really make sense. (ie, we are in agreement).
I don't know which particular style Java has, but they might have choices in this matter depending on what they've already published as a guarantee.
For the synchronous receiver, we have an outer loop that "while there is work" && "while the message buffer" is not empty, keep emitting messages downstream. The inner loop, for each buffered message, we try to emit it downstream. If we can't, currently we push it back into the message buffer.
I think in this case we'd want to update this section to instead, "Release" the message and break out of that while loop. https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/servicebus/azure-messaging-servicebus/src/main/java/com/azure/messaging/servicebus/SynchronousMessageSubscriber.java#L175-L177
It's possible that the next iteration of the outer while
loop, a sync work item was added, so we can't just clear the entire message buffer.
Note: We should double-check the "REQUESTED" to make sure the number of messages we request upstream aligns with the current state of our subscriber.
I would echo Richard's point on "If they are getting messages served up that are expiring then they are failing there, and should consider tuning their setup. If we just filter these messages out they have no signal that's occurring."
For Python users who encounter the situation of receiving expired messages, chances are that the message handling part blocks the program (as per Python singe thread model), leading messages in the local buffer expiring. Usually, I would recommend users to receive message one by one or handle the message in a separate thread.
I do think it's important to let users know/think why messages are coming expired:
and after this thinking process, users can tweak the configs and test out the application.
However, I don't think what I mentioned above turns down the idea of dropping expired messages. I can't think of ways how users could utilize those expired message except for logging -- I assume most users would just ignore and re-receive the message.
I think providing this option would simply users' life, however, it should be based on the fact users understand their application pattern and the consequence (resend messages), yet still wants to ignore the errors.
Are we sure that clock drift is not going to be an issue here? Until this point the drift between the client and the service hasn't mattered too much (maybe some inaccuracy with lock renewal).
We'd be actively and silently not returning things back to the user if we activated this feature.
Are we sure that clock drift is not going to be an issue here? Until this point the drift between the client and the service hasn't mattered too much (maybe some inaccuracy with lock renewal).
We'd be actively and silently not returning things back to the user if we activated this feature.
Can you clarify where the expiry time comes into play here? I think the proposal is to just not serve messages when no is listening for them - not to filter based on expiry time.
Are we sure that clock drift is not going to be an issue here? Until this point the drift between the client and the service hasn't mattered too much (maybe some inaccuracy with lock renewal). We'd be actively and silently not returning things back to the user if we activated this feature.
Can you clarify where the expiry time comes into play here? I think the proposal is to just not serve messages when no is listening for them - not to filter based on expiry time.
Possible I'm confused.
The original proposal in the issue is talking about dropping messages based on time, not if there's an active listener.
(line from the issue text): "The check of whether a message is expired can be made by looking at the lockedUntil property on the message."
@yunhaoling's latest reply (not trying to single you out Adam!):
However, I don't think what I mentioned above turns down the idea of dropping expired messages.
Indicates that we're still thinking about dropping them based on some expiration criteria. My point is that that's dangerous territory and I don't think that should be on the table.
The issue has had a few forks/threads though, so maybe the actual discussion has morphed a bit. Might be worth updating the issue summary to indicate what we're dropping and what we're keeping from it.
Issue description has been updated. Thanks @richardpark-msft!
This has now been released as part of version 7.6.0 of the azure-messaging-servicebus package
As reported in #25259, there is a chance for
ServiceBusReceiverClient.receiveMessages()
to return expired messages to the user even when prefetch is disabled. One of the side effects of this is that the user gets such messages served one more time. This issue is logged after a discussion with @shankarsama to explore the option of the client dropping such expired messages toUpdate: The approach chosen is to release messages as soon as they are received by the client if there is no active receive call from the user. This will apply only when prefetch is disabled.
The check of whether a message is expired can be made by looking at the lockedUntil property on the message.The potential downside of this approach is that a message might get dropped in such a manner multiple times reaching the max delivery count and get pushed to the dead letter queue without giving the user a chance to process it. This invalidates the at-least once promise. Therefore, it might be beneficial to add this feature behind an option and document that it is recommended to have a higher delivery count when using this option.Adding @shankarsama and @yvgopal to provide any additional thoughts
cc @conniey, @anuchandy, @JoshLove-msft