bgruening / docker-galaxy-stable

:whale::bar_chart::books: Docker Images tracking the stable Galaxy releases.
http://bgruening.github.io/docker-galaxy-stable
MIT License
226 stars 133 forks source link

Python Library for configure_slurm.py. #314

Open jmchilton opened 7 years ago

jmchilton commented 7 years ago

It'd be nice if we had a uniform set of variables and such for dealing with this inside and outside of Ansible as well as inside and outside of Docker (e.g. the original place this script was developed I think was Pulsar testing years ago - https://github.com/galaxyproject/pulsar/blob/master/scripts/configure_test_slurm.py). And it'd be nice if pip install slurm_configure==<version> was used for version handling across all these projects.

bgruening commented 7 years ago

What about ephemeris or ansible-extras?

jmchilton commented 7 years ago

@bgruening I don't really like either method - ansible-galaxy-extras isn't a library that can be readily used by Pulsar testing for instance and ephemeris is should ultimately be galaxy-centric and admin-centric I would think. This script is useful outside the context of Galaxy. I get the desire to keep things simple though.

bgruening commented 7 years ago

Ok, makes sense. Under galaxyproject or my account - this will answer the question you or me ;)

jmchilton commented 7 years ago

I was thinking galaxyproject or my account - I was thinking about this as a @jmchilton issue.

bgruening commented 7 years ago

Go for it!

chambm commented 7 years ago

I noticed this issue and didn't know where else to post about configure_slurm.py, so I'll post here. On the Galaxy Jetstream image, SLURM can be pretty finicky about getting the hostname right. I've even seen it report having an old IP when the instance is redeployed, e.g. I've seen something like:

root@js-56-78:~# hostnamectl
   Static hostname: js-12-34
Transient hostname: js-56-78.jetstream-cloud.org

Should configure_slurm.py handle this kind of quirk, or does the Jetstream image itself need some additional hostname finagling?

Less Jetstream specific, can I also recommend that configure_slurm.py set up SlurmDBD to keep track of jobs between reboots? It's frustrating to have a job counted as successful because the instance crashed and Galaxy can't find the job in SLURM's history (so it assumes it completed successfully).