We think that multiple workers were running the same indexing operation for a given item. This suggests that the implementation that combines multiple (docker pod) workers and RabbitMQ queues is not optimised. So,
should we change our RabbitMQ set-up to avoid duplication and errors?
Step 1: What is actually happening to the workers?
@rhysrevans3 thinks that Rabbit should be clever enough to only allow one client to read a message.
But if a timeout occurs then Rabbit will put the message back on to the queue.
What is the timeout for our system?
A possible solution (NOTE: This one doesn't work)
Possible different approach:
When a message is read (by the first worker), claim it, e.g. by adding:
We think that multiple workers were running the same indexing operation for a given item. This suggests that the implementation that combines multiple (docker pod) workers and RabbitMQ queues is not optimised. So,
should we change our RabbitMQ set-up to avoid duplication and errors?
Step 1: What is actually happening to the workers?
@rhysrevans3 thinks that Rabbit should be clever enough to only allow one client to read a message. But if a timeout occurs then Rabbit will put the message back on to the queue.
What is the timeout for our system?
A possible solution (NOTE: This one doesn't work)
Possible different approach:
{"worker": "<worker-host-id>", "timestamp": <datetime>}
THIS WILL NOT WORK - YOU CANNOT ACKNOWLEDGE A MESSAGE THAT IS NOT AT THE FRONT OF THE QUEUE.
Next idea?