Closed edgurgel closed 5 years ago
Hey After releasing 1.4 my plan is to somehow introduce this as "experimental" so it won't affect current users and they can try it out. I should have something "ready" in 2 weeks? I need to figure it out how to run integration tests
Hi @edgurgel, any update ?
@tlvenn , not really but I intend to get back to this probably next week. I need to find a simple way of making this optional for now so that we can release a non-major version 🤔
Maybe if no node_id
was defined Verk could generate one and keep track of these automatically generated etc.
I also don't know which kind of configuration should we expose for example:
And I'm not 100% sure how to run nice integration tests running 2 Verk instances etc...
Coordination is hard 😢
@edgurgel is this PR ready to be tested as is?
@mikeastock yes it works as expected! I need to add more tests and decide some other considerations. My goal is to have the next minor version with an option to use this to control your node ids. And my "release date" is end of October maybe before that
I said end of October but it will probably be mid November 😶
I'm adding some tests and ensuring this can be used as optional until it's robust enough to be used by all users.
@edgurgel maybe a xmas gift in the end ? ;)
@tlvenn, you joke but that's the plan :D! I will have some free time before christmas haha :)
I'm very close btw! Happy New Year! 🎉
Happy new year to you too @edgurgel !
SADD nodes node_id
PSETEX verk:node:#{node_id} 2 * frequency]
Each time a node starts working on a queue the queue name is added to verk:node:#{node_id}:queues
set;
Each time a node stops working on a queue the queue name is removed from verk:node:#{node_id}:queues
set;
Each frequency
seconds we set the node key to expire in 2 * frequency
PSETEX verk:node:#{node_id} 2 * frequency alive]
Also check for all the keys of all nodes. If the key expired it means that this node is dead.
To restore we go through all the running queues of that node and enqueue them from progress back to the queue. Each "enqueue back from in progress" is atomic (<3 lua) so we won't have duplicates. It's also not a problem if multiple nodes notice the dead node as they can't add the job more than once.
How to use:
generate_node_id
to true
If it's not true it won't use this new code. It will basically work as before.
I still have some minor things to change but the bulk of the work is done 👍
Hey team here is my first stab at solving this issue: https://github.com/edgurgel/verk/pull/159/files
The idea is:
(frequency) = 60 seconds ?
We may need to review some edge cases like what if we still have unfinished jobs while removing a queue from the list of running queues etc but I will work on them case by case
I need to review this as clearly it's just a stab at the final solution. I've played with some instances running locally and so far so good.
Related to https://github.com/edgurgel/verk/issues/157