Supervisor / supervisor

Supervisor process control system for Unix (supervisord)
http://supervisord.org
Other
8.33k stars 1.23k forks source link

Random Backoff for Load Distribution #1640

Open MosbyTheGreat opened 2 months ago

MosbyTheGreat commented 2 months ago

Hi!

I’ve been using supervisord to manage multiple programs (~20) and noticed that when several fail at once, they all restart together, which causes a big spike in load.

My understanding is that the current "startretries" setting doesn't prevent this. What if we add a random backoff between x and y seconds? This way, when programs restart after a failure, they do so at random intervals instead of all at once, helping spread out the load.

Would love to hear your thoughts on this or any advice on implementing it in my setup, maybe I just missed a setting, which would solve the problem.

Thanks!

Moritz