camall3n / onager

Lightweight python library for launching experiments and tuning hyperparameters, either locally or on a cluster
MIT License
20 stars 4 forks source link

Commit for adding tasks_per_job to oanger #41

Closed samlobel closed 2 years ago

samlobel commented 3 years ago

By passing --tasks-per-job to onager launch, lets you run multiple tasks using the same resources. For example, running 2 tasks on one GPU. Silently ignored on non-slurm.

Main current limitation is that it sends all logs in a job to the same place. So, if you have 2 tasks in the same job the logs can be jumbled and confusing. Also, my feeling is that when tasks_per_job is 1, it would be better to make wrapper.sh look like it used to for simplicity.

camall3n commented 2 years ago

@samlobel Sorry for the delay in getting to this!

My initial reaction is that this is a great idea for a feature, but that this implementation is rather complicated, and I'm worried about adding the extra complexity.

That said, here's a sketch of a different implementation: we use the existing onager functionality to have onager launch itself on each node, except that the launched copies are each using the local backend to service the jobs. I think this would take care of the multiprocessing, keep the logs from overwriting each other, and avoid the need of __filler__ ids.

If you feel so inclined to mess around with something like that, feel free to update this PR or make a new one!