bjpop / rubra

Infrastructure code to support DNA pipeline
MIT License
38 stars 18 forks source link

Support SLURM job scheduler #17

Open bjpop opened 11 years ago

bjpop commented 11 years ago

Add support for SLURM job scheduler.

nschiraldi commented 5 years ago

I was looking for something similar. Looking through https://github.com/bjpop/rubra/blob/master/rubra/cluster_job.py it doesn't seem like it would be difficult to add SLURM support. I'd be happy to contribute this, but I'm not sure if it's necessary.

Would it provide an increase in performance to use distributed: True with a flag for SLURM, compared to submitting an SBATCH script, with distributed: False and a rubra configuration that matches the SBATCH configuration?

Right now I'm essentially running a bash script via sbatch run.sh:

#!/usr/bin/sh
#SBATCH -n 1
#SBATCH --cpus-per-task=40
#SBATCH --mem=50000 # memory pool for all cores, this is in mb

rubra RedDog2 --config RedDog_config --style run

With the following config:

pipeline = {
    "logDir": "log",
    "logFile": "All_pipeline.log",
    "style": "print",
    "procs": 40,
    "paired": True,
    "verbose": 1,
    "end": ["deleteDir"],
    "force": [],
    "rebuild": "fromstart"
}
stageDefaults = {
    "distributed":    False,
    "walltime":    "01:00:00",
    "memInGB":    50,
    "queue":    None,
    "modules": [
        # Note that these are for Barcoo at VLSCI
        # You will need to change these for distributed (queuing) installation
        "python-gcc/2.7.5",
        "bwa-intel/0.6.2",
        "samtools-intel/1.3.1",
        "bcftools-intel/1.2",
        "eautils-gcc/1.1.2",
        "bowtie2-gcc/2.2.9",
        "fasttree-gcc/2.1.7dp"
    ]
}

This appears to be spawning parallel processes correctly.

d-j-e commented 5 years ago

see https://github.com/katholt/RedDog/issues/58 for answer