agstephens commented 2 years ago

We think that multiple workers were running the same indexing operation for a given item. This suggests that the implementation that combines multiple (docker pod) workers and RabbitMQ queues is not optimised. So,

should we change our RabbitMQ set-up to avoid duplication and errors?

Step 1: What is actually happening to the workers?

@rhysrevans3 thinks that Rabbit should be clever enough to only allow one client to read a message. But if a timeout occurs then Rabbit will put the message back on to the queue.

What is the timeout for our system?

A possible solution (NOTE: This one doesn't work)

Possible different approach:

When a message is read (by the first worker), claim it, e.g. by adding:
- {"worker": "<worker-host-id>", "timestamp": <datetime>}
Would need to cope with workers that might get stuck, e.g.:
- Remove a claim when time since claimed > THRESHOLD.
You cannot actually modify a Rabbit message, so the workflow would be:
1. Read message M1
2. Edit local version of message -> M2
3. Acknowledge M1
4. Re-queue M2
5. When work done
6. Acknowledge M2

THIS WILL NOT WORK - YOU CANNOT ACKNOWLEDGE A MESSAGE THAT IS NOT AT THE FRONT OF THE QUEUE.

Next idea?

agstephens commented 2 years ago

@rhysrevans3 @Mahir-Sparkess @spepler: Your thoughts on this would be interesting.

agstephens commented 2 years ago

166 looks like a better approach for resolving this.

cedadev / search-futures

Should we change the RabbitMQ setup (to avoid duplicate indexing)? #157

Step 1: What is actually happening to the workers?

A possible solution (NOTE: This one doesn't work)

Next idea?

166 looks like a better approach for resolving this.