RAM / CPU cores partitioning for multiple agents on the same machine

allegroai / clearml-agent

ClearML Agent - ML-Ops made easy. ML-Ops scheduler & orchestration solution

Apache License 2.0

242 stars 92 forks source link

Hi, I have several workstations with many CPU cores and a lot of RAM. All agents run Ubuntu. I would like to run multiple ClearML agents (barebones, no k8s) on each of the workstations.

Can I somehow prevent one agent running a job from hoarding all of the resources (e.g. CPU cores) and guarantee each agent a minimum quota (prefably dynamic e.g. at least 8 cores and 1/3rd of RAM but more if avaialble)?

Or, alternatively, can I prevent the accidental scheduling of another job to the machine which is bogged down by other tasks?

Is there any way to achieve that with ClearML? Documentation suggests that the only resource agents "manage" is GPUs.

EDIT: Just started to wonder, should I file it here or under ClearML project?

allegroai / clearml-agent

RAM / CPU cores partitioning for multiple agents on the same machine #176