Closed kyleam closed 4 years ago
@kyleam - you can take a look at this here:
https://github.com/nipy/nipype/blob/master/nipype/pipeline/plugins/slurm.py (a much simpler worker in the new engine): https://github.com/nipype/pydra/blob/master/pydra/engine/workers.py#L165
also in pydra we are testing slurm in a container: https://github.com/nipype/pydra/blob/master/ci/slurm.sh
@satra Thanks for the pointers.
Just in terms of my question about getting structured output about jobs, what I could gather from a quick skim suggests that sadly we're going to have to stick with parsing the unstructured output with a regexp.
Just in terms of my question about getting structured output about jobs, what I could gather from a quick skim suggests that sadly we're going to have to stick with parsing the unstructured output with a regexp.
indeed. the slurm output can be customized by a center.
one of the things we will try to do in pydra, unless someone has already done it is to analyze the slurm configuration to find out of resources, qos, partitions. we could also consider a test job to determine how to parse output. but these are all fancy things relative to the user simply saying this is where to go and run.
Merging #494 into master will decrease coverage by
5.01%
. The diff coverage is23.30%
.
@@ Coverage Diff @@
## master #494 +/- ##
==========================================
- Coverage 89.64% 84.63% -5.02%
==========================================
Files 148 148
Lines 12209 12272 +63
==========================================
- Hits 10945 10386 -559
- Misses 1264 1886 +622
Impacted Files | Coverage Δ | |
---|---|---|
reproman/support/jobs/tests/test_orchestrators.py | 32.37% <10.93%> (-61.07%) |
:arrow_down: |
reproman/support/jobs/submitters.py | 51.95% <32.25%> (-24.40%) |
:arrow_down: |
reproman/tests/skip.py | 93.25% <87.50%> (-4.28%) |
:arrow_down: |
reproman/resource/tests/test_ssh.py | 27.53% <0.00%> (-72.47%) |
:arrow_down: |
reproman/support/jobs/orchestrators.py | 46.56% <0.00%> (-45.28%) |
:arrow_down: |
reproman/interface/tests/test_execute.py | 71.84% <0.00%> (-28.16%) |
:arrow_down: |
reproman/resource/ssh.py | 75.00% <0.00%> (-13.34%) |
:arrow_down: |
reproman/interface/execute.py | 86.62% <0.00%> (-8.29%) |
:arrow_down: |
... and 8 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 2ffa175...8456499. Read the comment docs.
This is an initial stab at Slurm support (gh-484). There are still things to flesh out (most of the known ones should have to-do comment placeholders), and I bet someone familiar with Slurm could suggest better ways to do things (e.g., is it possible to get the job status as a JSON record?). But I was able to submit simple commands (with a single job and with multiple subjobs), so things seem to be wired up correctly at least at a basic level.
Given that I don't have access to an environment with Slurm, setting that up was the more involved part. Here are the details:
Slurm setup
* Clonesshd patch
```diff diff --git a/Dockerfile b/Dockerfile index d143635..197e92d 100644 --- a/Dockerfile +++ b/Dockerfile @@ -34,6 +34,8 @@ RUN set -ex \ psmisc \ bash-completion \ vim-enhanced \ + openssh-clients \ + openssh-server \ && yum clean all \ && rm -rf /var/cache/yum @@ -83,10 +85,21 @@ RUN set -x \ && chown -R slurm:slurm /var/*/slurm* \ && /sbin/create-munge-key +RUN echo 'root:root' |chpasswd + +RUN sed -ri 's/^#?PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config +RUN sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config + +RUN mkdir /root/.ssh + COPY slurm.conf /etc/slurm/slurm.conf COPY slurmdbd.conf /etc/slurm/slurmdbd.conf COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"] +EXPOSE 22 +ENV NOTVISIBLE "in users profile" +RUN echo "export VISIBLE=now" >> /etc/profile + CMD ["slurmdbd"] diff --git a/docker-compose.yml b/docker-compose.yml index f0862be..74fcb0e 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -37,8 +37,11 @@ services: - etc_slurm:/etc/slurm - slurm_jobdir:/data - var_log_slurm:/var/log/slurm + ports: + - "22" expose: - "6817" + - "22" depends_on: - "slurmdbd" diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh index 9a1203a..1a7a16f 100755 --- a/docker-entrypoint.sh +++ b/docker-entrypoint.sh @@ -23,6 +23,10 @@ fi if [ "$1" = "slurmctld" ] then + echo "---> Starting sshd ..." + ssh-keygen -A + /usr/sbin/sshd + echo "---> Starting the MUNGE Authentication service (munged) ..." gosu munge /usr/sbin/munged ```