Closed otbe closed 1 year ago
@otbe your concerns are valid, I am not sure at this point what is the right approach. It sounds like a general problem with SQS FIFO rather than something Spring Cloud AWS specific, but lets keep this issue open for some time, perhaps we will get some insights from the community.
@tomazfernandes do you have perhaps some thoughts on this?
Sure, in 3.0 simply set "messageVisibilitySeconds" and, for FIFO
queues, the visibility will be extended for each message in the group before processing, not unlike the @otbe's solution if I understand correctly.
Just make sure one message doesn't take longer than the time set.
Ok so this is a non issue in 3.0, right?
Ok so this is a non issue in 3.0, right?
Yup, using this feature in 3.0, the visibility extension is applied to all unprocessed messages as each message gets processed, so it won't be a problem.
A feature we could look into in the future would be having a separate process to extend message visibility even as the message is being processed, but that seems not too idiomatic TBH. But I've seen it in an SQS Lib somewhere.
Type: Bug
Component: SQS
Describe the bug Im not quite sure if this is a bug report, a feature request or just a misunderstanding on my side. So please bear with me 😇
We use a fifo queue to process "commands" that affect a 3rd party system. We use a message group id that groups commands that should be processed sequentially. One day we saw an increased latency in processing these commands and we saw exceptions like the following in our logs
The reason for this is simple, we have hit our visibility timeout. Every message (of the same receiveMessage call) after the first exception also printed the same exception.
That made me think about the implementation of the
MessageGroupExecutor
here. Theres https://github.com/awspring/spring-cloud-aws/pull/192 that fixes the case where we should stop processing messages whenever an exception was raised but I think theres another issue.Imagine we call
receiveMessage
with the default message count of 10 and (worst case scenario) we receive 10 messages for the same message group. As soon as we receive them the default visibility timeout of the queue will start ticking. Lets say we didnt change the default, this means we have a default of 30s for each message. Really? I think no. In the scenario above we will have 30s for all 10 messages because if we spend 30s to process the first 5 messages, we wont be able to delete the sixth messages (see error above).That basically means we have processed the message, we were not able to delete the message and it will be visible to the other consumers. Actually we will process it twice.
I can only think of some ways to work around this right now (from a userland perspective):
Visibility
when using fifo queues and callextend
at the very beginning of your handler to force an exception if thereceiptHandle
has already timed outOption 2 is basically our current approach. We have a little helper class that runs "side effects" of SQS handlers which basically will call
extend
as the very first action and then starts a timer task that extends the timeout every X while executing the real action. At the end we introduced an eager check if the messages is still valid in the sense of calling an SQS API 🤔Of course this will only work after https://github.com/awspring/spring-cloud-aws/pull/192 has been merged to master/released.
Would be happy if someone with deeper knowledge could enlighten me here? Am I totally wrong with assumptions about SQS, Fifo or the implementation here? 😇