esi-neuroscience / acme

Asynchronous Computing Made ESI
https://esi-acme.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
11 stars 2 forks source link

Test cluster.adapt for SLURMCluster initialization #14

Closed pantaray closed 3 years ago

pantaray commented 3 years ago

in slurmfun jobs would all go in the cue and run when resources became available. Maybe we should be using cluster.adapt() instead?

KatharineShapcott commented 3 years ago

I tested it, currently SLURMcluster.adapt() doesn't work, it just uses the minimum_jobs and doesn't change at all :(

It's quite easy to check though, I just replaced cluster.scale(total_workers) in esi_cluster_setup with cluster.adapt(minimum_jobs=1, maximum_jobs=total_workers).

It looks like someone else had the same problem https://github.com/dask/dask-jobqueue/issues/463 and fixed it in this pull request https://github.com/dask/distributed/pull/4155/files. I'll keep an eye on it and see if someone adds it to dask then we can test again.

pantaray commented 3 years ago

Thank you for looking into this! Nice detective work. The PR looks exactly like what we need - I suggest I'll go ahead and incorporate the necessary code (particularly #12, #13) , comment out the adapt part and we enable it, once the PR is merged in a new dask release.