Open annawoodard opened 4 years ago
cc @mattwelborn @Lnaden
If you remove the max_workers
and instead use only cores_per_worker
, then what is the way to limit the number of workers which can be created? If you define max_blocks
, cores_per_worker
, nodes_per_block
, and cores_per_node
(resource hint), then all that says is how to allocate the resources between the workers, and with some math you can work out the max workers as:
max_workers
= max_blocks
nodes_per_block
cores_per_node
/ cores_per_worker
If you assume that workers cannot be oversubscribed (1 task per worker), then you can also work out the max tasks. But, Parsl provides the ability to oversubscribe workers by setting cores_per_worker < 1
. From the HTEX API docs:
cores_per_worker (float) – cores to be assigned to each worker. Oversubscription is possible by setting cores_per_worker < 1.0. Default=1
How can we prevent Parsl from then trying to just overassign tasks to the workers such that each worker is only able to process 1 task? Or am I misreading that and the oversubscription is on cores (i.e. 1 core can be assigned to multiple workers)?
max_workers
=max_blocks
nodes_per_block
cores_per_node
/cores_per_worker
This is not quite right. max_workers
is the max workers per manager, so it is independent of max_blocks
and nodes_per_block
.
Say you have 24-core nodes. Currently, you can define max_workers=4
, or equivalently, you can define cores_per_worker=6
. Both settings will lead to 4 workers launched on per node, whether or not you define cores_per_node
(because the manager, which launches the workers, is on the node and can figure out how many cores are on the node). What defining cores_per_node
does is allow the scaling machinery of Parsl to make a better guess about how many workers it will get given a certain number of blocks launched.
How can we prevent Parsl from then trying to just overassign tasks to the workers such that each worker is only able to process 1 task? Or am I misreading that and the oversubscription is on cores (i.e. 1 core can be assigned to multiple workers)?
The oversubscription is on cores-- one core can be assigned to multiple workers. Each worker always runs one task.
(so to clarify: if max_workers
were eliminated, the way to limit the number of workers would be with cores_per_worker
)
I should have been a bit more clear, the equation I wrote, assuming the control variable max_workers
is removed, was more meant to be "Calculate the maximum number of concurrent workers, and thus tasks, which this Parsl Config can run." In that context, is the equation correct?
Ah yes, sorry I misunderstood-- your equation is correct.
So, since each worker runs one task, does that clarify your original question?
Yes it does! And there is no way for 1 worker to run multiple tasks at the same time?
On a related question, if one does not provide the resource hints, how does Parsl determine the number of cores each node has for workers to consume? i.e., if you request a node which has 24 cores and launch 1 block on the whole node, when/how will Parsl detect the node is consumed?
max_workers
is just an alternative way of definingcores_per_worker
(except on heterogenous queues, where you probably wantcores_per_worker
anyways). It is just another thing for users to have to understand, and it is confusing whether this means max workers per node, per block, or across all blocks. Misconfiguringmax_workers
can seriously affect scaling in unexpected ways (see #1341).I favor removing it entirely. If there is resistance to that idea, we should at least rename to
max_workers_per_manager
.