Open rueberger opened 1 year ago
I haven't forgotten about this issue.
cores_per_node is also used in the scaling code for another purpose, to figure out how many new workers to expect when starting (or ending) a block of workers: so it would be an expected situation to see both exclusive and cores_per_node values set at the same time.
The use of the cores_per_node parameter for multiple-but-similar purposes bothers me but I don't have a good feeling for what the user interface should be changed to look like.
I think, though, it's probably right to set exclusive to false by default: to inherit the default behaviour of the underlying batch system.
When
--exclusive
is set in slurm, which is done by default in the slurm provider, the number of cpus requested per node is ignored as the entire node is assigned by slurm. This is a subtle footgun which overrides attempts by the user to allocate resources at a finer-grain scale than whole-nodes.Furthermore,
parsl
only spins up the number of workers originally requested by the user. For example, this config will spin up four workers per node, leaving 20 cores idle on a cluster with 24 core nodes:As far as I can tell, the only reason to set
--exclusive
is in clusters where oversubscription is allowed.exclusive
should be set toFalse
by default, or at the very least, a warning should be raised whenexclusive
is set toTrue
andcores_per_node
is notNone
.