JuliaParallel / ClusterManagers.jl

Other
232 stars 74 forks source link

Limiting number of cores per node on with LSF #191

Open raminammour opened 1 year ago

raminammour commented 1 year ago

Hello,

How do i limit the number of cores per node with the LSFManager?

I would normally use -R "span[ptile=...]" but the option seems to be ignored if passed as an option to lsf.

I am assuming it is because of the job array launching each core separately? @bjarthur would probably know :)

Thank you!

bjarthur commented 1 year ago

details like this are usually very dependent on how your sysadmins have configured your cluster.

would help if you posted your code, as well as logs showing why you think it is ignored.

raminammour commented 12 months ago

Sorry for the late response @bjarthur , I had no access to lsf system till now.

The jobs are launched on the same node for example below:

julia> addprocs_lsf(4,bsub_flags=`-R "span[ptile=2]"`)
[ Info: `bsub -R 'span[ptile=2]' -cwd ... -J 'julia-649580[1-4]' .../julia --worker=HD6PDHgBP`

julia> pmap(i->gethostname(),1:nworkers()) |> unique
1-element Vector{String}:
 "r310n11"

Without job arrays two nodes as expected:

bsub -Is -n 4 -R "span[ptile=2]" mpirun hostname
Job <512551> is submitted to default queue ...
<<Waiting for dispatch ...>>
<<Starting on r309n05>>
r309n07
r309n07
r309n05
r309n05

I appreciate the help!

bjarthur commented 11 months ago

i have never used the R flag, so i'm not sure i can help. sorry.