Open ihowell opened 2 years ago
Hi @ihowell , I have the same issue. Did you find any solution? Regards
Edit: I was running the tasks from within another repo thinking that it must pass the right parameters. However running the tasks as explained in examples, solved my struggle... -.-
jobs = []
with executor.batch():
for arg in whatever:
job = executor.submit(myfunc, arg)
jobs.append(job)
In general the LocalExecutor has less feature than the SlurmExecutor and indeed if you start 100 jobs using LocalExecutor they will all run at once without regard for the hardware requested or the hardware available on your machine. In short we haven't implemented a queue for LocalExecutor. This is a major footgun, but also not something easily fixable, will need to think about it how to implement this: eg I feel we would like to spawn the subprocess ASAP to be able to return a process id which serves as job id, but make sure the jobs actually start one after the other.
Personally I often use the DebugExecutor which will run exactly one job at once in the current process.
Thanks for the tip. I would however like to be able to run say 4 jobs at once (number of cores on my machine). Maybe we could use the multiprocessing library instead of subprocesses? This would allow us to use the semaphore while still returning a job construct with a process id I believe.
Hi!
Is there any updates on this? Is it solved?
The same thing happens using Slurm launcher
in hydra on clusters for me!
The expected behavior of this parameter setting when using the
LocalExecutor
(or in my case theAutoExecutor
on a non-slurm node) would be to keep the number of spawned processes to 1. I useexecutor.batch()
to perform a delayed batch, which then spawns a processes for each job, which quickly overwhelms my computer.The issue seems to be that a controller process is spawned per job: https://github.com/facebookincubator/submitit/blob/main/submitit/local/local.py#L163 Each controller processes immediately spawns and runs the controller instead of checking if the number of running controllers is less than the number of tasks allowed.