bensheldon / good_job

Multithreaded, Postgres-based, Active Job backend for Ruby on Rails.
https://goodjob-demo.herokuapp.com/
MIT License
2.66k stars 199 forks source link

tools for managing a 'fleet' of processes #150

Open jrochkind opened 4 years ago

jrochkind commented 4 years ago

Hi, were talking on reddit some time ago and I suggested it would be useful to have tools for managing a fleet or cluster separate worker processes -- since on MRI that's the only way to take advantage of multiple cores, which you probably want to be doing when you have a separate host just for bg workers, as is usually what you want at even moderate scale.

We agreed it's a bit tricky to figure out how to implement that, especially for those of us not experienced in "systems programming"

Recently someone brought this project to my attention, which hypothetically takes care of it for you! https://github.com/stripe/einhorn

It's a bit under-documented (and the README basically says "you're welcome to use this, but don't ask us questions or bother filing bug reports without PRs"), but I've been playing with it a bit and looking at the code, and it looks really nice!

The only real requirement it has is that your worker processes catch a USR2 signal as a message to do a graceful shutdown. So mentioning this in part to get it in there, so you don't accidentally use USR2 for anything else requiring a backwards incompat change to be compatible with einhorn. :( (resque uses USR2 for something else, alas. Sidekiq uses USR2 appropriately for einhorn, I think because sidekiq-enterprise actually uses einhorn).

bensheldon commented 3 years ago

I've been thinking more about this lately.

I was searching for projects and forked gem looks like it might fit the bill, though I didn't see anything particularly about zombie management, which is something that I would like to trust is taken care of (and the complexity of that is also why I'm eager to find a maintained gem that can do that for me).

I also like the look of https://github.com/salsify/delayed_job_worker_pool

jrochkind commented 3 years ago

@bensheldon Einhorn's lack of maintenance makes you reluctant? It does seem to be some pretty sophisticated code. It is too bad that I can't find as high quality an option that is maintained either.

sandstrom commented 3 years ago

If this isn't an issue, maybe we could close or move to discussion.

bensheldon commented 3 years ago

I'm going to close this Issue for now, but am open to continuing the conversation. I do think that having a complete Puma-like fork+multithreaded executable would be really nice, but don't plan to implement that myself in the near future.

rgaufman commented 2 years ago

Why not just use systemd for this? - I've played around with a lot of different tools for forking and managing processes, including Bluepill, God and Eye. Eye was the best but it was still significantly more resource intensive. Even with sidekiq, I just do systemctl start sidekiq which starts all my sidekiq processes (e.g. sidekiq@worker1 sidekiq@worker2, etc), stop conversely stops them all.

It's not like there is a need for a shared socket with job processing.

bensheldon commented 2 years ago

Memory. Precious memory, especially in containerized environments.

Also, I agree on systemd, but people want to daemonize 🤷‍♀️

rgaufman commented 2 years ago

How would forked gem save memory vs starting 2 processes with systemd?

Hmm, in dev I "daemonize" with foreman, in prod, systemd :) - I can understand why this is useful when you need to share a single socket (at the expense of wasting memory!) - but I still don't see how this would save memory in this case?

For example with Puma, you start a single worker, it takes 7% ram, you start a 2 worker cluster, it takes 21% (!!):

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21339 deployer  20   0  568076 279440  13304 S  35.9 7.0 628:27.11 puma: cluster worker 0: 4963 [current]
21342 deployer  20   0  548076 278540  13328 S  34.6 7.0 632:45.05 puma: cluster worker 1: 4963 [current]
 4963 deployer  20   0  558076 278440  22048 S   1.0 7.0 881:10.39 puma 5.6.4 (tcp://192.168.187.71:3000) [current]

An extra 7% wasted ram for the process manager

bensheldon commented 2 years ago

Are you using Puma's preload_app!

Puma has a lot of other interesting copy-on-write optimizations: https://github.com/puma/puma/blob/master/docs/fork_worker.md

rgaufman commented 2 years ago

Yes, I am. Interesting, will have a read.

rgaufman commented 2 years ago

"fork_worker option and refork command for reduced memory usage by forking from a worker process instead of the master process. " - Ah, ok, so no more master process, saves 7% ram, but will still take the equivalent of starting 2 processes, so no saving in the case of good_job from what I understand?

bensheldon commented 2 years ago

Sorry, I meant to emphasize preload_app!. That's what saves memory through copy-on-write. The different forking strategies I linked to are further attempts to optimize loading as many Ruby constants as possible before forking.

rgaufman commented 2 years ago

Interesting, just having a read through this: https://shopify.engineering/ruby-execution-models

jrochkind commented 2 years ago

Note that the einhorn ruby tool to "run (and keep alive) multiple copies of a single long-lived process", originally from stripe, for a long time basically unmaintained, has now been adopted by mperham of sidekiq.

I believe einhorn is used by sidekiq pro for managing multiple sidekiq worker processes, and probably could be by good_job as well. Perhaps with a few tweaks to good_job, like interpreting SIGUSR2 as a graceful shutdown request. Possibly more to take full advantage of things like pre-forking management built into einhorn.