Closed likun1212 closed 2 years ago
I don’t believe this claim is right. the launcher portion of the batch script works by
$SLURM_CPUS_ON_NODE
)
By SLURM’s nature, a job step has exclusive access to all the resources on the nodes for which they were scheduled, meaning that if you start a ray worker within a job step, it will be able to use all the resources of the node on which it’s running. See the —exclusive
section of the srun page:
This option applies to job and job step allocations, and has two slightly different meanings for each one. When used to initiate a job, the job allocation cannot share nodes with other running jobs (or just other users with the "=user" option or "=mcs" option). If user/mcs are not specified (i.e. the job allocation can not share nodes with other running jobs), the job is allocated all CPUs and GRES on all nodes in the allocation, but is only allocated as much memory as it requested. This is by design to support gang scheduling, because suspended jobs still reside in memory. To request all the memory on a node, use --mem=0. The default shared/exclusive behavior depends on system configuration and the partition's OverSubscribe option takes precedence over the job's option. This option can also be used when initiating more than one job step within an existing resource allocation (default), where you want separate processors to be dedicated to each job step. If sufficient processors are not available to initiate the job step, it will be deferred. This can be thought of as providing a mechanism for resource management to the job within its allocation (--exact implied).
The exclusive allocation of CPUs applies to job steps by default, but --exact is NOT the default. In other words, the default behavior is this: job steps will not share CPUs, but job steps will be allocated all CPUs available to the job on all nodes allocated to the steps.
There are multiple ways in which to start a ray cluster inside a SLURM job and this is only one of them. If you run this script with varying values of —ntasks-per-node
you will see that your ray cluster possess more resources and your screen runs faster.
try this:
#SBATCH -N 1
#SBATCH -p ???
#SBATCH --ntasks-per-node 1
#SBATCH -c 4
echo $SLURM_CPUS_ON_NODE
################# say I have 32 cpus on that node, but it will return 4 instead of 32, since i am asking for 4 cpus.
You should read up on the SLURM documentation if you’re confused by this. You requested 4 so SLURM gave you 4 even if the node has 32. If you want all 32 then ask for them
Hi
I think there are 2 potential bugs in run_pyscreener_distributed_example.batch( run_molpal.batch as well) I found with this script ray can not leverage all the resources on a cluster node.
1) in line 7 "#SBATCH --ntasks-per-node 4". I think "--ntasks-per-node" should always = 1. quote: "this will be used to guarantee that each Ray worker runtime will obtain the proper resources" see https://docs.ray.io/en/latest/cluster/slurm.html.
2) in line 31 and 41, you start ray cluster with " --num-cpus $SLURM_CPUS_ON_NODE ". However, this can only let ray use part of cpus in a node. for instance, say you ask 1 node and set "-c = 4" and "--ntasks-per-node 4", ray can only use 4*4=16 cpus eventhough you have 32 cpus in the node. ($SLURM_CPUS_ON_NODE will return 16 instead 32).
suggested config: ######################################################################
########################################