JuliaParallel / ClusterManagers.jl

Other
235 stars 74 forks source link

refactor LSFManager to use jobs arrays #135

Closed bjarthur closed 3 years ago

bjarthur commented 4 years ago

much more efficient with the scheduler's resources than with interactive jobs.

i've also added a means to add cluster workers to a personal workstation via a new ssh_cmd keyword argument.

bjarthur commented 4 years ago

now with tests!

vchuravy commented 4 years ago

@bjarthur I gave you commit privileges, feel free to merge when you feel this is ready.

juliohm commented 3 years ago

@bjarthur it would be great to have this PR merged, and perhaps updated to use blaunch instead as suggested by @vchuravy I am currently doing a hack in our cluster to be able to distribute Julia processes, and would pretty much enjoy stop doing this hack after addprocs_lsf is refactored.

See here for more information about the hack: https://github.com/JuliaLang/julia/issues/37526

bjarthur commented 3 years ago

i use ssh to launch workers on the cluster from my personal workstation. the latter doesn't have blaunch, so your suggested refactoring wouldn't work for me. happy to merge though as it has survived some battlefield testing.

juliohm commented 3 years ago

So your SLF cluster doesn't have blaunch? I thought it was part of the distribution. I don't know. The issue of launching with bsub as you're doing is that you're creating tiny jobs instead of a big job with multiple processes right? I remember trying running the PR in the cluster and it didn't work but I can double check.

On Sat, Sep 12, 2020, 14:38 Ben Arthur notifications@github.com wrote:

Merged #135 https://github.com/JuliaParallel/ClusterManagers.jl/pull/135 into master.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaParallel/ClusterManagers.jl/pull/135#event-3759586468, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3MHLYDNKSQB6SEUNSLSFOWY5ANCNFSM4MRLKWJA .