WatsonLab / MAGpy

Snakemake pipeline for downstream analysis of metagenome-assembled genomes (MAGs) (pronounced mag-pie)
62 stars 23 forks source link

Argument list too long for rule `diamond_bin_summary` #11

Closed halexand closed 5 years ago

halexand commented 5 years ago

Hello,

I am trying to run MAGpy with ~3000 MAGs. When working with this larger number of mags I have come upon this error in the rule diamond_bin_summary:

RuleException:
OSError in line 67 of /vortexfs1/omics/alexander/Alexander-MAGpy/MAGpy:
[Errno 7] Argument list too long: '/bin/bash'
  File "/vortexfs1/omics/alexander/Alexander-MAGpy/MAGpy", line 67, in __rule_diamond_bin_summary
  File "/vortexfs1/home/halexander/.conda/envs/snakemake/lib/python3.6/subprocess.py", line 709, in __init__
  File "/vortexfs1/home/halexander/.conda/envs/snakemake/lib/python3.6/subprocess.py", line 1344, in _execute_child
  File "/vortexfs1/home/halexander/.conda/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

It would appear that trying to concatenate the output from all the diamond results is a bit too much for my particular system. I wonder if breaking it into a for loop would make the pipeline more extendable.

halexand commented 5 years ago

For me the following worked:


rule diamond_bin_summary:
        input: expand("diamond_report/bin.{sample}.tsv", sample=IDS)
        output: "diamond_bin_report.tsv"
        shell:
            """
            echo -e 'name\tnprots\tnhits\tnfull\tgenus\tngenus\tspecies\tnspecies\tavgpid' >> {output}
            for x in {input}
                do
                cat $x >> {output}
                done
            """

Happy to turn it into a PR if you like.

mw55309 commented 5 years ago

Hello, thanks for this, yes please submit a PR that looks about right :)