Open pbecker93 opened 1 year ago
On the contrary, I would prefer changing the implementation, so that it is "per node" and not "per slurm job". :D I think it is very convenient to do in on a per node basis, because ain't no body got time for figuring out how many parallel things can be started in a job, if there are n nodes with m gpus.
@pbecker93, how do you feel about parallel_runs_per_slurm
?
@ScheiklP, but clusterduck doesn't control how many nodes there are, just the number of slurm jobs that are started. And in theory you can have slurm jobs that use multiple nodes. I'm not quite sure I understand what needs to be "figured out" in your use case, maybe you can explain.
I guess my expected / desired behavior for num_nodes > 1 would be that clusterduck just scales stuff up.
So for total_runs_per_node = 10, parallel_runs_per_node = 4, num_nodes = 2 -> per job 8 parallel runs with 20 runs in total.
But as you said, there is only ever one node :D
@balazsgyenes : better, but parallel_runs_per_slurm_job
might be even more explicit?
@ScheiklP I am with Balazs here, on any normal cluster (i.e. not Horeka) the same number of jobs might not even end up on the same number of physical nodes at different times - and there is no control over how many nodes you get
@pbecker93 Sure you can. That's what the nodes
parameter of SLURM is for. The job of nodes=2
will be like one machine.
Paul, I'm still not totally sure what you mean by "there is only ever one node".
If I understood you correctly, I would have nothing against num_parallel_slurm_jobs
, where a user can specify either that or total_runs_per_slurm_job
(but not both), and the hydra jobs would be distributed among the slurm jobs accordingly. I think it might be a bit confusing for a first-time user, but maybe worth it. Is this what you want?
The jobs that clusterduck submits only request 1 node per job, right?
It's configurable. The intended use is to request all the resources that you know a node has, and then request one node, but slurm isn't actually supposed to work like that. If I say num_nodes=2
, that means that each job is spread over 2 nodes, but doesn't necessarily have exclusive access to them.
Exclusivity of resources on a node is defined by the cluster maintainer. For Alex, for example, nodes are not exclusive, but GPUs are. For Horeka, all resources on a node are exclusive. So I am not sure what you mean.
So the current behavior is total_runs_per_node = 10, parallel_runs_per_node = 4, num_nodes = 2 -> per job 4 parallel runs with 10 runs in total, 2 parallel runs per node?
num_nodes
is a slurm parameter that is very different from the rest. If you specify n_tasks=4 and num_nodes=2, your two tasks will be spread across 2 nodes, requiring inter-process communication to synchronize them. With a single task and multiple nodes, I think it duplicates tasks across nodes, but I'm not 100% certain.
So with total_runs_per_node = 10, parallel_runs_per_node = 4 -> 4 parallel runs per slurm job with up to 10 hydra jobs each, each hydra job might get run twice (I'm not sure), and you have no control over which nodes your slurm jobs run on.
I have a very vague memory, that if n_tasks = m -> the same thing that you run will just be executed m times in parallel for each node.
So if your script says python train.py
, you have num_nodes = 2 and n_tasks = 4, it will run
python train.py
a total of 8 times. 4 parallel per node.
Should we rename the "per_node" fileds (i.e. parallel_runs_per_node, total_runs_per_node) as they are not per node but per slurm job? I get the issue of everything being called a "job", so maybe really never use it standalone but be like "slurm_job", "hydra_job"? Or maybe someone has a better solution