Closed edgurgel closed 5 years ago
Updated algorithm:
Each time a job is moved to the list of jobs inprogress
of a queue
this node is added to verk_nodes
(SADD verk_nodes node_id
) and the queue is added to verk:node:#{node_id}:queues
(SADD verk:node:123:queues queue_name
)
Each frequency
seconds we set the node key to expire in 2 * frequency
PSETEX verk:node:#{node_id} 2 * frequency alive]
Check for all the keys of all nodes. If the key expired it means that this node is dead.
To restore we go through all the running queues (verk:node:#{node_id}:queues
) of that node and enqueue them from inprogress
back to the queue
. Each "enqueue back from in progress" is atomic (<3 lua) so we won't have duplicates.
How to use:
generate_node_id
to true
If it's not true it won't use this new code. It will basically work as before.
Here are some changes I made from the original PR #159
QueueManager
maintainsverk_nodes
andverk:node:#{node_id}:queues
up-to-date. It avoids possible de-synchronization between Redis and local state of the Verk node. The main idea behind this changes is: "If any job is added to theinprogress
list, this node and this queue must be tracked so that other nodes can rescue their failure if it happens."QueueManager
will conditionally maintain these data structures ifgenerate_node_id
is true.Node.Manager
to not crash ifheartbeat
failed.