celery / celery

Distributed Task Queue (development branch)
https://docs.celeryq.dev
Other
24.81k stars 4.67k forks source link

Restart worker after idle period #9031

Open prhbrt opened 5 months ago

prhbrt commented 5 months ago

I'll work around this, so this is just a suggestion, to help the AI-comrads, as these often have long running tasks and hw/memory requirements.

Related Issues and Possible Duplicates

Related Issues

Possible Duplicates

Brief Summary

To leverage loading (AI/LLM-)models in (GPU-)memory and having memory available, an option similar to worker_max_tasks_per_child that kills a worker after an idle-time to free memory would be helpful. This naturally implements caching and loading if needed.

Design

Architectural Considerations

I couldn't quit grasp the celery comsumer-loop, but I'm going to mimic this behavior by spinning off a thread that checks idle-time and releases memory when too idle.

However, my best guess is that this is the consumer loop. My next guess is that it would have some sort of condition-variable to wait for jobs. I'd suggest adding a timeout there to sometimes check worker_max_idle.

Proposed Behavior

After a worker has been idle for at least worker_max_idle, the worker is either killed or restarted.

Proposed UI/UX

worker_max_idle=120

thedrow commented 1 month ago

Contributions are welcome!

Nusnus commented 1 month ago

@prhbrt

Proposed Behavior

After a worker has been idle for at least worker_max_idle, the worker is either killed or restarted.

I’m not sure about the restarted, but the killed can make sense. A host system can instantiate a new worker if the previous one was killed to “restart” the consumer

Proposed UI/UX

worker_max_idle=120

Would you like this value to say, “Kill the worker in 120 seconds since the last running task finished (and no other task is currently running)“?

Lastly,

To leverage loading (AI/LLM-)models in (GPU-)memory and having memory available

Can you better define the problem you’re trying to solve with this?