FirebaseExtended / firebase-queue

MIT License
786 stars 108 forks source link

The in_progress state #76

Closed christianalfoni closed 7 years ago

christianalfoni commented 7 years ago

We had some weird bugs in our application and narrowed it down to having multiple specs with the same in_progress ID.

What happened is that randomly, often related to errors in our code, the first reqistered Queue would pick up tasks that it should not (based on its start_state).

It seems that in_progress state ID affects which queue handles it. I can not find any documentation on how this happens and how in_progress state might be picked up by Queues.

Would love some information on this and would be happy to contribute to docs, just not sure how this actually works :-)

cbraynor commented 7 years ago

This has to do with the way that queue workers arbitrate each other - there's no central coordinator, so each worker listens for tasks in their spec's in_progress_state (specifically here). If that task has been in that state for too long (as defined by the worker's current spec timeout), then it's presumed that the previous worker has died and one of the other workers will reset the task to its start_state.

By using the same in_progress_state for all you specs, the workers on the spec with the shortest timeout will always reset the task to their start_state. If you have multiple with the same timeout, there's a race, so it probably won't be the one you want.

The states are intended to be very specific about the precise state they're in, hence why it's required that you specify an in_progress_state rather than falling back to a default. Since a Queue is specific to a spec, it's not really possible to add any programatic checks across specs. If you have any specific ideas about how we can make that more clear in the docs then I'll take them into consideration

christianalfoni commented 7 years ago

Thanks for a great explanation @drtriumph ! I will make a pull for suggested doc changes :)