SLURM submission? - Githubissues

GeoMicroSoares commented 5 years ago

Hi there,

Anyone ever tried running MAGpy in a SLURM environment? Getting a lot of Error: Snakefile "Snakefile" not present. and wondering if it's just me not knowing how to submit this or something else.

Last command I tried: $ snakemake --use-conda --cluster-config MAGpy.json --cluster "sbatch -n {core} -t {time} --mem={vmem} -P {proj} -D /scratch/a.ans74/MAGpy/" --jobs 100.

All tests for the installation run well btw.

Cheerio!

mw55309 commented 5 years ago

You need to add

-s MAGpy

GeoMicroSoares commented 5 years ago

Thanks @mw55309 , that starts things off! It does however then show that somehow that the variables in the .json aren't being properly imported?

$ snakemake --use-conda -s MAGpy --cluster-config MAGpy.json --cluster "sbatch -n {core} -t {time} --mem={vmem} -P {proj} -D /scratch/a.ans74/MAGpy/" --jobs 100

Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 100
Job counts:
    count   jobs
    1   all
    1   checkm
    1   checkm_plus
    39  diamond
    1   diamond_bin_summary
    1   diamond_bin_summary_plus
    39  diamond_report
    39  pfam_scan
    1   phylophlan_link
    82  sourmash_gather
    1   sourmash_report
    1   sourmash_sig
    207

[Mon Jun  3 14:25:57 2019]
rule diamond:
    input: proteins/changed_Bin_4_7_1-contigs.faa
    output: diamond/changed_Bin_4_7_1-contigs.diamond.tsv
    jobid: 427
    wildcards: id=changed_Bin_4_7_1-contigs
    threads: 16

RuleException in line 42 of /scratch/a.ans74/MAGpy/MAGpy:
NameError: The name 'core' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}

mw55309 commented 5 years ago

Think you need {cluster.core}, {cluster.time} etc etc in your sbatch arguments

GeoMicroSoares commented 5 years ago

Getting a new one now... Could this have to do with calling sbatch from a conda envrionment?

$ snakemake --use-conda -s MAGpy --cluster-config MAGpy.json --cluster "sbatch -n {cluster.core} -t {cluster.time} --mem {cluster.vmem} -P {cluster.proj} -D /scratch/a.ans74/MAGpy" --jobs 100
Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 100
Job counts:
    count   jobs
    1   all
    1   checkm
    1   checkm_plus
    39  diamond
    1   diamond_bin_summary
    1   diamond_bin_summary_plus
    39  diamond_report
    39  pfam_scan
    1   phylophlan_link
    82  sourmash_gather
    1   sourmash_report
    1   sourmash_sig
    207

[Mon Jun  3 14:45:13 2019]
rule diamond:
    input: proteins/changed_Bin_39_5_1-contigs.faa
    output: diamond/changed_Bin_39_5_1-contigs.diamond.tsv
    jobid: 478
    wildcards: id=changed_Bin_39_5_1-contigs
    threads: 16

sbatch: error: Batch job submission failed: Job dependency problem
Error submitting jobscript (exit code 1):

[Mon Jun  3 14:45:13 2019]
rule diamond:
    input: proteins/Bin_43_2_1-contigs.faa
    output: diamond/Bin_43_2_1-contigs.diamond.tsv
    jobid: 419
    wildcards: id=Bin_43_2_1-contigs
    threads: 16

sbatch: error: Batch job submission failed: Job dependency problem
Error submitting jobscript (exit code 1):

mw55309 commented 5 years ago

Hmmm, first guess would be your cluster has no nodes that can satisfy the 16 cores * 16Gb configuration that MAGpy.json specifies for diamond jobs.

GeoMicroSoares commented 5 years ago

Modified MAGpy.json partially to what's below but still getting the same errors... Any dependencies/modules I would need to import you can think of?

    "diamond" :
    {
        "core" : "4",
        "time" : "24:00:00",
        "vmem" : "8G",
        "proj" : "myqueue"
    },
    "checkm" :
    {
        "core" : "4",
        "time" : "24:00:00",
        "vmem" : "8G",
        "proj" : "myqueue"
    },

mw55309 commented 5 years ago

Belt and braces approach, have you charged the '''threads:''' within MAGpy to match the cores in MAGpy.json?

GeoMicroSoares commented 5 years ago

I have not - should I? Seemed like something not worth messing about unless you recommend in this case

mw55309 commented 5 years ago

I would in this case just to be sure

GeoMicroSoares commented 5 years ago

Uff - doesn't like it...

SyntaxError in line 16 of /scratch/a.ans74/MAGpy/MAGpy:
Unexpected keyword cores in rule definition (MAGpy, line 16)
(

mw55309 commented 5 years ago

Whats in /scratch/a.ans74/MAGpy/MAGpy ?

GeoMicroSoares commented 5 years ago

/scratch/a.ans74/MAGpy/ would be where MAGpy (the file I modified threads --> cores) is.

mw55309 commented 5 years ago

Sorry I should have been clearer.

So in MAGpy where we had

threads: 16

change this to

threads: 4

i.e. so the threads: directive matched the core: parameter in the json file

mw55309 commented 5 years ago

Oh also I see you're using

-P {cluster.proj}

What is cluster.proj in your JSON and do you have a project/queue called that on your cluster?

GeoMicroSoares commented 5 years ago

There we go, removed -P and we have liftoff! Just that the batch submission is being aborted due to user job submission limits but now talking to my sysadmin about that. I'll close the issue as soon as we get it going properly and provide the final script for it.

Just for future reference, command I'm using atm in the SLURM environment: $ snakemake --use-conda -s MAGpy --cluster-config MAGpy.json --cluster "sbatch -n {cluster.core} -t {cluster.time} --mem {cluster.vmem} -D /scratch/a.ans74/MAGpy" --jobs 100

GeoMicroSoares commented 5 years ago

Hi there,

If it's of any use for any future SLURM MAGpy users, I got my stuff running and avoided job submission mayhem with the following wrapped up in a quick bash script (salloc takes its time to run sometimes):

salloc -n 1000
snakemake --use-conda -s MAGpy --cluster-config MAGpy.json --cluster "srun -n {cluster.core} -t {cluster.time} --mem {cluster.vmem} -D /scratch/a.ans74/MAGpy" --jobs 100

Thanks again @mw55309 and all the best!

WatsonLab / MAGpy

SLURM submission? #13