Closed jbwexler closed 3 years ago
Merging #574 (36167f1) into master (6c49cfe) will decrease coverage by
0.33%
. The diff coverage is36.36%
.
@@ Coverage Diff @@
## master #574 +/- ##
==========================================
- Coverage 89.22% 88.88% -0.34%
==========================================
Files 149 149
Lines 13054 13087 +33
==========================================
- Hits 11647 11632 -15
- Misses 1407 1455 +48
Impacted Files | Coverage Δ | |
---|---|---|
reproman/interface/run.py | 100.00% <ø> (ø) |
|
reproman/support/jobs/submitters.py | 65.29% <36.36%> (-4.07%) |
:arrow_down: |
reproman/distributions/tests/test_venv.py | 86.32% <0.00%> (-9.41%) |
:arrow_down: |
reproman/distributions/tests/test_docker.py | 94.73% <0.00%> (-5.27%) |
:arrow_down: |
reproman/resource/aws_ec2.py | 59.06% <0.00%> (-2.60%) |
:arrow_down: |
reproman/tests/skip.py | 95.55% <0.00%> (-2.23%) |
:arrow_down: |
reproman/resource/docker_container.py | 92.35% <0.00%> (-1.92%) |
:arrow_down: |
reproman/resource/ssh.py | 88.09% <0.00%> (-0.80%) |
:arrow_down: |
reproman/utils.py | 86.97% <0.00%> (-0.17%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 6c49cfe...6bfb104. Read the comment docs.
Great, thank you @jbwexler !
I wonder if it is easyiesh to install/"provision" the submitter on top of an existing slurm instance?
If so -- it would be great to get some "unit" test to ensure that it is all working. We have https://github.com/ReproNim/reproman/blob/master/tools/ci/setup-slurm-container.sh and some testing of interactions with slurm in https://github.com/ReproNim/reproman/blob/master/reproman/support/jobs/tests/test_orchestrators.py
@kyleam could you please review the PR?
Re
As I write this it occurs to me that we should also add some of the if statements in slurm_launcher.template to slurm.template in order to fix #549
would be much appreciated -- might better be in a separate PR to not make it contingent on this one.
Thanks for the feedback! @kyleam I like your solution and am working on it now.
One question: I need to add --ntasks to slurm.template, which can take a job parameter called 'num_tasks'. My slight concern is that some documentation describes --ntasks as the number of processes, which is confusing considering reproman has a different job parameter called num_processes, which is assigned to --cpus-per-task. For what it's worth, the above documentation describes --cpus-per-task as the number of threads. My understanding of the differences between procs/threads/tasks is limited so I thought I would get your opinion on this.
That's a good question, and I think in general how to handle these parameters across the different submitters is an unresolved issue:
I think I made a bad choice in mapping num_processes
to --cpus-per-task
. The Slurm template came after PBS and Condor. For PBS it maps to ppn
, so I guess --ntasks-per-node
(without --ntasks
) would be the closest match. For Condor it is request_cpus
and maybe --ntasks
would be the best fit. My suggestion for now would be to drop the use of --cpus-per-tasks
and go with --ntasks
or --ntasks-per-node
.
Fwiw I've added launcher to the container that we test the current slurm submitter against. Based on quick testing, it looks like that may be enough to work with your new template. I'll try to wire that up after you push the next iteration.
https://github.com/ReproNim/reproman-slurm/commit/1cd5e8859e31d72fec4c1b3542d7659e018c7b2e
I ended up creating a new branch here: https://github.com/ReproNim/reproman/pull/577
Sorry for the delay, I've finally been able to work on this. This fix turns out to be pretty simple and seems to work well so far but I haven't done any kind of extensive testing yet. Just thought I would let you guys take a look. Let me know if you have questions. I think there's probably a more elegant way of creating the SlurmLauncherSubmitter class than how did it, which was simply to copy and paste the SlurmSubmitter class and make a few minor edits.
As I write this it occurs to me that we should also add some of the if statements in slurm_launcher.template to slurm.template in order to fix https://github.com/ReproNim/reproman/issues/549
Some example output: