Documentation request - how to safely kill workers

jackdpeterson commented 8 years ago

Hello all,

When we wre rolling out new releases of our code we would like to ensure that jobs stop and start safely without just barfing and dying mid-way through.

$ sudo service supervisord restart seems like a terrible, but highly effective approach at stopping and restarting processes (okay if we can accept data loss anywhere and everywhere with any job).

Similarly, using the internal supervisor tooling is another approach but both of these approaches are external to the script running / SlmQueue's operations.

Assuming that an average job takes say 10-30 seconds to complete for a given workload type, I could see one having jobs stop/start after 1 run as being the safest but also the most computationally expensive route to take.

Is there another option / recommendation with regards to how you handle rolling out updates and safely killing / restarting said jobs?

basz commented 8 years ago

simply set the stopwaitsecs option of the supervisor config to something higher then your longest running jobs.

setting the maximum jobs to be executed to 1 doesn’t solve your problem.

I use, were my longest running jobs are maximum one minute;

stopwaitsecs=120 ; max num secs to wait b4 SIGKILL (default 10)

jackdpeterson commented 8 years ago

So, just for clarity... when supervisor sends a stop signal ... no further jobs are processed by the worker process? Then it's just a matter of waiting for the job to actually die off. Correct?

basz commented 8 years ago

as long as php has the pcntl extension installed. yes.

jackdpeterson commented 8 years ago

Cheers!

Webador / SlmQueue

Documentation request - how to safely kill workers #172