flatironinstitute / disBatch

Tool to distribute a list of computational tasks over a pool of compute resources. The pool can grow or shrink.
Apache License 2.0
39 stars 8 forks source link

Specifying memory usage per task? #19

Open lgarrison opened 3 years ago

lgarrison commented 3 years ago

disBatch has worked great for me for dealing with heterogeneous tasks that take different amounts of CPU time. But I'm now facing a set of jobs with heterogeneous memory usage, and I'm stuck requesting a very conservative number of jobs-per-node so that no node blows out its memory usage. I can estimate each job's memory usage, and I'm wondering if it would be possible to communicate this information to disBatch so that it would only dispatch a job to a node when it has enough memory for it. I guess the syntax might be something like:

#DISBATCH MEM 8GB
job1
job2
#DISBATCH MEM 20GB
job3
etc...

disBatch would also need to know about the available memory on each node, which I guess it could learn through Slurm. For non-Slurm backends, maybe it could be specified manually.

Do you think a feature like this makes sense for disBatch? I'd be happy to take a stab at this feature if you think so!

njcarriero commented 3 years ago

Thanks for the suggestion.

This is something that I have been thinking about for a while (and the related variable number of cores).

I haven't come up with a solution that doesn't involve reinventing a non-trivial portion of a resource manager. If you think you have a good idea, go for it. FYI, the dynamicdb branch will soon(-ish) become release 2.0.

In a pinch, a user can clump small tasks together via shell ops, e.g. using a task like " job1 & job2 ; wait ". But that defeats one design goal which was to provide per-task record keeping.

lgarrison commented 3 years ago

Thanks; I didn't have an implementation immediately in mind, and I agree that one doesn't want to reinvent a resource manager! In the last few weeks I think my need for this feature has lessened, but I may return to this in the future. The job-chaining idea may help too; thanks for the suggestion.

xiuliren commented 2 years ago

I was looking for this feature as well. My pipeline consists of over ten tasks with different resource requirements. I am currently running them manually one by one!