LearnBoost / up

Zero-downtime reloads and requests load balancer based on distribute.
540 stars 73 forks source link

Exponentially backed-off respawns with maximum attempts #60

Open cmawhorter opened 11 years ago

cmawhorter commented 11 years ago

I find this useful when used in concert with minExpectedLifetime: '0s'

Adds 3 new options: -backoffRespawns: null to disable; -1 to enable with no max. attempts; positive sets maximum attempts -backoffDelay: node-backoff "initialDelay" option in milli -backoffMaxDelay: node-backoff ""maxDelay" option in milli

Relies on node-backoff for the backoff implementation.

Sorry... no tests =/

cmawhorter commented 11 years ago

Should help with #2 and #21

fintanf commented 11 years ago

I like the proposed functionality, but in my testing the worker changes to terminated before spawned. As a result, the worker is not added to the workers array, the ~self.workers.indexOf(w) test fails, and the callback (fn) is not called. Moving fn && fn(w.readyState) outside of the if seems to fix this, though I'm not certain it doesn't break anything else.

Also, as a result of the initial respawnWorker call being inside that same if, no respawns take place when the first set of workers fail. This is true of the existing keepalive single respawn of course, but it would be nice to deal with this here also.

Probably the reason why my workers are 'terminated' before they are 'spawned' is down to using the assumeReady:false and up.ready() for asynchronously declaring readiness of the worker - the worker terminates before that callback fires.

cmawhorter commented 11 years ago

I have some time tonight and need this functionality for something I'm working on so I'll see if I can take a look and get some tests going. Thanks for having a look.

cmawhorter commented 11 years ago

Oops... committed on accident and don't feel like fixing. Plz stand by.

cmawhorter commented 11 years ago

Added tests that check that new events are fired correctly.

@fintanf The reason workers are terminated before they're spawned seems to be that when child process workers exit immediately, they're not alive long enough to receive the message that they've been spawned and emit that event. Thus, they go straight from spawning -> terminated which was causing the problem.

The PR now includes the original backoff respawns w/max. retries, but also includes the unsuccessful event that fires when a worker doesn't reach it's minimum lifetime. Order now and this pull request can be yours for 3 easy payments of $19.95.

Also, the special case of a min. lifetime of 0 is now officially supported and will not emit unsuccessful (with tests).

fintanf commented 11 years ago

This seems to work nicely in the synchronous worker loading case (assumeReady:true). For async worker loading (assumeReady:false) it only retries once though. To recreate you should just have to use assumeReady:false and make your worker throw an error immediately.