Open das046 opened 3 years ago
Thanks for posting @das046 -- I'll take a look at this soon. This issue has some implications on the specification which might break backward compatibility. I just tested this on my end, and I was able to inject something as follows:
- name: run-lulesh
description: Run LULESH.
run:
cmd: |
#SBATCH --cpus-per-task=2
$(LAUNCHER) -c 2 $(LULESH)/build/lulesh2.0 -s $(SIZE) -i $(ITERATIONS) -p > $(outfile)
depends: [make-lulesh]
nodes: 2
procs: 27
exclusive : True
walltime: "00:10:00"
A workaround to just get you running would be to hardcode the #SBATCH --cpus-per-task=<num>
as I did above. That'll get you moving in the meantime while I figure out the best way to tackle the actual issue.
Thanks for your issue! :)
Thanks for the quick response. I will try your method, it should work for my case.
Thank you, Da
I tried this solution and in my case it worked. I checked that the cpu affinity (os.sched_getaffinity(0)
) in python did indeed match #SBATCH --cpus-per-task
.
I am not sure why procs_in_batch
only considered from self._batch
setting but not current step.run.procs
, it is quite unintuitive that if
nodes
was set in stepprocs
is not set in batch but set in stepprocs setting is being ignored, imo an intuitive interpretation of the setting nodes: n
and procs: p
means n
nodes with p
cores per node, I am new to slurm tho, was using torque PBS.
Hi @kftsehk -- thanks for the comment. I'll have to revisit the logic in the SLURM adapter. On a quick skim, I'm not sure why the logic is that way and it's been a little while since I've looked into that particular code. I can look into it a bit later today. Is it holding you up or are you encountering a bug?
I understand what you mean about the confusion of procs
and agree -- It's a misnomer on my part when I first made the adapter (was new SLURM myself as well). It's meant to actually represent total tasks
. Recently, the Flux and LSF adapters have started using cores per task
and other less ambiguous keys, so we're planning to start exploring making the specification more consistent among the different schedulers.
Also @kftsehk -- please feel free to post a reproducer specification if you have one that's misbehaving and describe how you expected it to behave. That would help us better understand.
Hi,
I was trying to use maestrowf to submit jobs. I could set the number of CPUs (cores per task) of the job in the YAML file, something like below.
It would generate the following slurm script to be submitted.
The
cores per task
option corresponded to the-c 2
in thesrun
line. But there is no option#SBATCH --cpus-per-task=2
above, that would cause errors in the SLURM of our cluster. Could you add this modify the code and add this option? I believe it won't break other parts.Thank you,
Da Shi Frederick National Lab