franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

Error in rule binRefine (one of the commands exited with non-zero exit code) #74

Closed slambrechts closed 1 year ago

slambrechts commented 3 years ago

Hi Francisco,

When running bash metaGEM.sh -t binRefine --local with the metagem env activated, I seem to get a "(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)" error for each job (both the binRefine jobs and metabatCross jobs). I can't seem to find which command is causing this though. Do you have any idea? Complete log for the first job:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
binRefine          43              1              1
metabatCross        4              1              1
total              48              1              1

Select jobs to execute...

[Tue Aug 10 15:28:21 2021]
rule binRefine:
    input: /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins, /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins, /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins
    output: /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/refined_bins/UT_6
    jobid: 253
    benchmark: /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/benchmarks/UT_6.binRefine.benchmark.txt
    wildcards: IDs=UT_6
    resources: tmpdir=/tmp

[Tue Aug 10 15:28:32 2021]
Error in rule binRefine:
    jobid: 253
    output: /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/refined_bins/UT_6
    shell:

        # Activate metawrap environment
        set +u;source activate envs/metawrap;set -u;

        # Create output folder
        mkdir -p /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/refined_bins/UT_6

        # Make job specific scratch dir
        fsampleID=$(echo $(basename $(dirname /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)))
        echo -e "
Creating temporary directory /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/intermediate_files/refined_bins/${fsampleID} ... "
        mkdir -p /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/intermediate_files/refined_bins/${fsampleID}

        # Move into scratch dir
        cd /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/intermediate_files/refined_bins/${fsampleID}

        # Copy files to tmp
        echo "Copying bins from CONCOCT, metabat2, and maxbin2 to /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/intermediate_files ... "
        cp -r /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins .

        echo "Renaming bin folders to avoid errors with metaWRAP ... "
        mv $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins) $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)|sed 's/-bins//g')
        mv $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins) $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins)|sed 's/-bins//g')
        mv $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins) $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins)|sed 's/-bins//g')

        echo "Running metaWRAP bin refinement module ... "
        metaWRAP bin_refinement -o .             -A $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)|sed 's/-bins//g')             -B $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins)|sed 's/-bins//g')             -C $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins)|sed 's/-bins//g')             -t 48             -m 1600             -c 50             -x 10

        rm -r $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)|sed 's/-bins//g') $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins)|sed 's/-bins//g') $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins)|sed 's/-bins//g') work_files
        mv * /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/refined_bins/UT_6

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
franciscozorrilla commented 3 years ago

Hey Sam,

It is hard to say what is going wrong since your jobs do not seem to produce log files, likely due to the --local flag. When I run metaGEM on the cluster (i.e. not using --local mode), my logs/ folder is populated with a file for each individual job, and this is where I can always see what is going wrong if my jobs fail. I'm not sure why these detailed logs dont simply get printed onto your terminal screen when running --local mode, which is what I have previously experienced. Perhaps we could force these logs to be created using some Snakemake functionalities. Could you please try modifying the following rule by adding the log parameter?

https://github.com/franciscozorrilla/metaGEM/blob/f8822a054f39494418ddf758167909a9468d1944/Snakefile#L879-L887

 rule binRefine: 
     input: 
         concoct = f'{config["path"]["root"]}/{config["folder"]["concoct"]}/{{IDs}}/{{IDs}}.concoct-bins', 
         metabat = f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins', 
         maxbin = f'{config["path"]["root"]}/{config["folder"]["maxbin"]}/{{IDs}}/{{IDs}}.maxbin-bins' 
     output: 
         directory(f'{config["path"]["root"]}/{config["folder"]["refined"]}/{{IDs}}')
     log:
         "logs/binRefine.{{IDs}}.log"
     benchmark: 
         f'{config["path"]["root"]}/{config["folder"]["benchmarks"]}/{{IDs}}.binRefine.benchmark.txt' 

Also make sure that the logs/ folder exists prior to running the jobs.

Regarding the failed metabat jobs, it may be possible that for your remaining 4 samples were not able to produce any bins so the job fails. You can similarly try adding the log parameter to the metabat Snakefile rule and rerun to see if you can produce logs with more info.

Please let me know if this helps. Best, Francisco

slambrechts commented 3 years ago

Hi Francisco,

Ok interesting. I added the log parameter to both the binRefine and metabatCross rules, which produces the log files, but they are all empty.

Best, Sam

franciscozorrilla commented 3 years ago

That is quite strange. I wonder if it may have to do with the Snakemake version you are using? I have been using v6.2.1 or older. For example, I do not recognize the job summary table printed out by Snakemake in your original post:

Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
all                 1              1              1
binRefine          43              1              1
metabatCross        4              1              1
total              48              1              1

To debug your specific problems I would probably try "manually" running a metabatCross or binRefine job. That is, refer to the Snakefile rule of interest and run the job yourself for one sample. Then you should be able to tell if the issue is Snakemake, or one of the conda envs, or some specific package.

slambrechts commented 3 years ago

Ok when I manually put in the commands for a specific sample, I get an error at the following step:

(metawrap) [vsc42339@gligar08 UT_6]$ mv $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins) $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)|sed 's/-bins//g')
mv: cannot stat 'UT_6.concoct-bins': No such file or directory

Which is strange, because when I check, the folder 'UT_6.concoct-bins' does exist

franciscozorrilla commented 3 years ago

That line is simply supposed to rename the concoct folder from UT_6.concoct-bins to UT_6.concoct to avoid errors with metaWRAP. The basename command simply removes all filepath info to just keep the base name. Based on your error message I suspect that you skipped some steps at the start of the rule.

More specifically, can you confirm that you created the job specific tmp dir:

https://github.com/franciscozorrilla/metaGEM/blob/f8822a054f39494418ddf758167909a9468d1944/Snakefile#L896-L899

changed directory the newly created dir:

https://github.com/franciscozorrilla/metaGEM/blob/f8822a054f39494418ddf758167909a9468d1944/Snakefile#L901-L902

and copied the files to the tmpdir:

https://github.com/franciscozorrilla/metaGEM/blob/f8822a054f39494418ddf758167909a9468d1944/Snakefile#L904-L906

Let me know if this helps!

slambrechts commented 3 years ago

Yes, I did:

(base) [vsc42339@gligar08 metaGEM]$ set +u;source activate envs/metawrap;set -u;
(metawrap) [vsc42339@gligar08 metaGEM]$ mkdir -p /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/refined_bins/UT_6
(metawrap) [vsc42339@gligar08 metaGEM]$ fsampleID=$(echo $(basename $(dirname /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)))

(metawrap) [vsc42339@gligar08 metaGEM]$ mkdir -p /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/intermediate_files/refined_bins/${fsampleID}
(metawrap) [vsc42339@gligar08 metaGEM]$ cd /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/intermediate_files/refined_bins/${fsampleID}
(metawrap) [vsc42339@gligar08 UT_6]$ cp -r /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins
(metawrap) [vsc42339@gligar08 UT_6]$ mv $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins) $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)|sed 's/-bins//g')
mv: cannot stat 'UT_6.concoct-bins': No such file or directory

But I notice my job specific tmp dir is empty, and it appears the bins were not copied to the tmpdir. So I have to figure out why

slambrechts commented 3 years ago

Ok sorry I forgot the dot at the end of cp -r /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins . to specify the location where they need to be copied. So anyway, this command is not the cause of the problem

slambrechts commented 3 years ago

Ok so also when running the final command (below) manually, everything seems to work fine. So I guess it's down to snakemake then

(metawrap) [vsc42339@gligar08 UT_6]$ metaWRAP bin_refinement -o . -A $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/concoct/UT_6/UT_6.concoct-bins)|sed 's/-bins//g') -B $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/metabat/UT_6/UT_6.metabat-bins)|sed 's/-bins//g') -C $(echo $(basename /scratch/gent/vo/000/gvo00043/vsc42339/MICROBIAN/metaGEM/maxbin/UT_6/UT_6.maxbin-bins)|sed 's/-bins//g') -t 48 -m 1250 -c 50 -x 10

snakemake version that got installed together with the metagem env seems to be 6.5.0. Not sure whether it is a good idea to try and replace it with snakemake to v6.2.1 or older? I would like to try that to see if that solves the problem, but I would not want to loose the other metagem functionalities that have been working fine, so not sure...

franciscozorrilla commented 3 years ago

Hey Sam, sorry for the delayed response!

You could try setting up a parallel metaGEM environment with a lower Snakemake version by generating a modified version of this file:

https://github.com/franciscozorrilla/metaGEM/blob/f8822a054f39494418ddf758167909a9468d1944/envs/metaGEM_env.yml#L1-L23

Simply change the env name on line 1 to something descriptive like metagem_oldsnake, and line 20 to snakemake<=6.2.1.

Assuming that you name this modified file something like metaGEM_oldsnake_env.yml and place it in your envs/ subfolder, then you can run:

conda env create --prefix ./envs/metagem_oldsnake -f envs/metaGEM_oldsnake_env.yml

This should set up a conda env named metagem_oldsnake in your envs/ folder. You can now test to see if the jobs run successfully with a lower snakemake version. If this works out for your, then you can simply modify your config.yaml file to use your new modified metaGEM env.

Best wishes, Francisco

slambrechts commented 3 years ago

Hi Francisco,

Thank you for the detailed explanation! I tried the above as you suggested, and checked whether snakemake was idd v6.2.1 with snakemake -v in this new metagem_oldsnake env.

However, when running bash metaGEM.sh -t binRefine --local with the metagem_oldsnake env activated, I get the same "(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)" error for each job. And also, the log files in the logs/ directory are again all just empty files.

So snakemake should not be the cause of this problem. Do you perhaps know if anyone else succeeded in running bash metaGEM.sh -t binRefine in local mode?

For now I can run the binRefine and remaining metabatCross jobs manually, but feel free to let me know if I can test anything else to try and resolve this.

Best wishes, Sam

franciscozorrilla commented 3 years ago

Hi Sam,

Thanks for reporting back, now we know that the Snakemake version was not the source of your problem. Perhaps the problem is specific to the HPC that you are using. Unfortunately, I am not aware of anyone else running metaGEM in --local mode on the cluster as you are doing. If possible, I would encourage you to try running metaGEM on cluster mode, i.e. without the --local flag.

Apologies for the late response.

Best wishes, Francisco

LiZhihua1982 commented 1 year ago

Hi, I meet the same error, I think the error caused by "bash metaGEM.sh -t maxbin -j 2 -c 24 -m 80 -h 10 --local" step,which need mv *.fasta GJ1.maxbin-bins/ . if not, it will cause binsC 0 bins. I think the correspongding snakefile may need revise: The original snakefile for maxbin is below: rule maxbin: input: assembly = rules.megahit.output, R1 = rules.qfilter.output.R1, R2 = rules.qfilter.output.R2 output:

directory(f'{config["path"]["root"]}/{config["folder"]["maxbin"]}/{{IDs}}/{{IDs}}.maxbin-bins')

benchmark:
    f'{config["path"]["root"]}/{config["folder"]["benchmarks"]}/{{IDs}}.maxbin.benchmark.txt'
    # Create output folder
    mkdir -p $(dirname {output})
   .......
   .......
    mkdir $(basename {output})
    mv *.fasta *.summary *.abundance *.abund1 *.abund2 $(basename {output})
    mv $(basename {output}) $(dirname {output})

I want to change the last two line as mv *.fasta $(dirname {output})

franciscozorrilla commented 1 year ago

Hi @LiZhihua1982 , thanks for pointing out this bug, I was unable to reproduce it until just now. Indeed, it looks like the fasta files were being placed outside of the expected bins folder, resulting in metaWRAP then being unable to locate binsC. Addressing this in #126