Closed flefler closed 1 year ago
I ran createFolders
before this. After running (metagem) dail@swamp:~/metaGEM/workflow$ bash /home/dail/metaGEM/workflow/metaGEM.sh -t fastp -l
the qfiltered folder disappears.
Hi Forrest,
Thanks for the extensive report, I think I have an idea what might be going on, looks like the important error message is this:
/usr/bin/bash: line 2: activate: No such file or directory
Which means you probably have the metagem
environment set up but maybe its not under the envs/
subdirectory in the root folder. Please run conda list
to see the path to your metagem env, and then make sure to replace the defaultenvs/metagem
field in the config.yaml
file to point towards the appropriate environment path:
Hope this helps and please let me know if you have further questions.
Best, Francisco
p.s. if possible, its recommended to run metaGEM on a high performance computer cluster rather than local workstation
p.p.s. this is expected behavior btw, since the qfilter rule fails then Snakemake deletes the output folder as a precaution.
I ran
createFolders
before this. After running(metagem) dail@swamp:~/metaGEM/workflow$ bash /home/dail/metaGEM/workflow/metaGEM.sh -t fastp -l
the qfiltered folder disappears.
Hi Francisco,
Thanks for getting back to me so quickly, this resolved my problems with fastp and crossMapSeries. However, I a similar problem with the binRefine step, see error below. I am using a server that will be able to handle my data (not that many samples), but it does not have slurm setup; hence the -l
usage. I tried to run without the -l
flag to no avail.
I silenced the lines that activate the conda env, as suggest in a previous issue. https://github.com/franciscozorrilla/metaGEM/issues/104#issuecomment-1117576796. Silenced or not, this step gives an error.
I did configure GTDB-tk and CheckM.
(metagem) dail@swamp:~/metaGEM/workflow$ bash metaGEM.sh -t binRefine -l
Version: 1.0.5
Setting current directory to root in config.yaml file ...
Parsing Snakefile to target rule: binRefine ...
Do you wish to continue with these parameters? (y/n)y Proceeding with binRefine job(s) ...
Please verify parameters set in the config.yaml file:
path: root: /home/dail/metaGEM/workflow scratch: /home/dail/metaGEM/workflow/tmp folder: data: dataset logs: logs assemblies: assemblies scripts: scripts crossMap: crossMap concoct: concoct maxbin: maxbin metabat: metabat refined: refined_bins reassembled: reassembled_bins classification: GTDBTk abundance: abundance GRiD: GRiD GEMs: GEMs SMETANA: SMETANA memote: memote qfiltered: qfiltered stats: stats proteinBins: protein_bins dnaBins: dna_bins pangenome: pangenome kallisto: kallisto kallistoIndex: kallistoIndex benchmarks: benchmarks prodigal: prodigal blastp: blastp blastp_db: blastp_db scripts: kallisto2concoct: kallisto2concoct.py prepRoary: prepareRoaryInput.R binFilter: binFilter.py qfilterVis: qfilterVis.R assemblyVis: assemblyVis.R binningVis: binningVis.R modelVis: modelVis.R compositionVis: compositionVis.R taxonomyVis: taxonomyVis.R carveme: media_db.tsv toy: download_toydata.txt GTDBtkVis: cores: fastp: 4 megahit: 12 crossMap: 12 concoct: 12 metabat: 12 maxbin: 12 refine: 12 reassemble: 12 classify: 2 gtdbtk: 12 abundance: 12 carveme: 4 smetana: 12 memote: 4 grid: 12 prokka: 2 roary: 12 diamond: 12 params: cutfasta: 10000 assemblyPreset: meta-sensitive assemblyMin: 1000 concoct: 800 metabatMin: 50000 seed: 420 minBin: 1500 refineMem: 1600 refineComp: 50 refineCont: 10 reassembleMem: 1600 reassembleComp: 50 reassembleCont: 10 carveMedia: M8 smetanaMedia: M1,M2,M3,M4,M5,M7,M8,M9,M10,M11,M13,M14,M15A,M15B,M16 smetanaSolver: CPLEX roaryI: 90 roaryCD: 90 envs: metagem: /home/dail/metaGEM/workflow/envs/metagem metawrap: /home/dail/metaGEM/workflow/envs/metawrap prokkaroary: /home/dail/metaGEM/workflow/envs/prokkaroary
Please pay close attention to make sure that your paths are properly configured! Do you wish to proceed with this config.yaml file? (y/n)y
Unlocking snakemake ... Unlocking working directory.
Dry-running snakemake jobs ... Building DAG of jobs... Job counts: count jobs 1 all 1 binRefine 1 concoct 1 maxbinCross 4
[Tue Aug 29 21:21:56 2023] rule maxbinCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins jobid: 7 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt wildcards: IDs=sample1
[Tue Aug 29 21:21:56 2023] rule concoct: input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins jobid: 2 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt wildcards: IDs=sample1
[Tue Aug 29 21:21:56 2023] rule binRefine: input: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins, /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins, /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins output: /home/dail/metaGEM/workflow/refined_bins/sample1 jobid: 1 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.binRefine.benchmark.txt wildcards: IDs=sample1
[Tue Aug 29 21:21:56 2023] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.
Job counts: count jobs 1 all 1 binRefine 1 concoct 1 maxbinCross 4 This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. Do you wish to submit this batch of jobs on your local machine? (y/n)y Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Conda environments: ignored Job counts: count jobs 1 all 1 binRefine 1 concoct 1 maxbinCross 4 Select jobs to execute... Failed to solve scheduling problem with ILP solver. Falling back to greedy solver. Run Snakemake with --verbose to see the full solver output for debugging the problem.
[Tue Aug 29 21:21:58 2023] rule maxbinCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins jobid: 7 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt wildcards: IDs=sample1
[Tue Aug 29 21:21:58 2023] rule concoct: input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins jobid: 2 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt wildcards: IDs=sample1
Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/sample1 ...
Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/sample1 ...
Unzipping assembly ... Unzipping assembly ... gzip: contigs.fasta already exists; not overwritten gzip: contigs.fasta already exists; not overwritten [Tue Aug 29 21:21:58 2023] Error in rule maxbinCross: jobid: 7 output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins shell:
# Activate metagem environment
#set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;
# Create output folder
mkdir -p $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
# Make job specific scratch dir
fsampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
echo -e "
Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} ... " mkdir -p /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}
# Move into scratch dir
cd /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}
# Copy files to tmp
cp -r /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/maxbin/sample1/cov/*.depth .
echo -e "
Unzipping assembly ... " gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)
echo -e "
Generating list of depth files based on crossMapSeries rule output ... " find . -name "*.depth" > abund.list
echo -e "
Running maxbin2 ... " run_MaxBin.pl -thread 12 -contig contigs.fasta -out $(basename $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)) -abund_list abund.list
# Clean up un-needed files
rm *.depth abund.list contigs.fasta
# Move files into output dir
mkdir -p $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
while read bin;do mv $bin $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins);done< <(ls|grep fasta)
mv * $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Job failed, going on with independent jobs. [Tue Aug 29 21:21:58 2023] Error in rule concoct: jobid: 2 output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins shell:
# Activate metagem environment
#set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;
# Create output folder
mkdir -p $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
# Make job specific scratch dir
sampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
echo -e "
Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} ... " mkdir -p /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}
# Move into scratch dir
cd /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}
# Copy files
cp /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv .
echo "Unzipping assembly ... "
gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)
echo -e "Done.
Cutting up contigs (default 10kbp chunks) ... " cut_up_fasta.py -c 10000 -o 0 -m $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') > assembly_c10k.fa
echo -e "
Running CONCOCT ... " concoct --coverage_file $(basename /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv) --composition_file assembly_c10k.fa -b $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)) -t 12 -c 800
echo -e "
Merging clustering results into original contigs ... " merge_cutup_clustering.py $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_gt1000.csv > $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv
echo -e "
Extracting bins ... " mkdir -p $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) extract_fasta_bins.py $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv --output_path $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
# Move final result files to output folder
mv $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) *.txt *.csv $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Job failed, going on with independent jobs. Exiting because a job execution failed. Look above for error message Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-29T212157.875193.snakemake.log (metagem) dail@swamp:~/metaGEM/workflow$
Hi Forrest,
Based on your error messages it looks like concoct and maxbin fail before even getting to the binRefine stage. It is possible that you may have tmp files from a previous job run getting in the way:
gzip: contigs.fasta already exists; not overwritten gzip: contigs.fasta already exists; not overwritten
Could you try deleting everything inside the tmp folder, and then re-running the jobs? e.g. rm -r tmp/*
Also, could you check the log files for those specific concoct/maxbin job runs to see if there is a more specific error message? Normally these would be generated inside your logs/
folder, but not sure if that is still the case with with local flag usage.
Sorry you are having trouble with this. For some background: the workflow was tested and designed to run on slurm-based HPC infrastructures, with some users even expanding to qsub-based infrastructures. However, since I do not have access to workstations, I have not developed support for this kind of infrastructure. The local command/flag is mostly meant for troubleshooting, debugging, testing, etc. I am happy, however, to help you to the best of my ability to get metaGEM running on your workstation 💎
Hi Francisco,
I really appreciate your help! I deleted all files and started from dataset. I ran bash metaGEM.sh -t fastp -l
followed by bash metaGEM.sh -t crossMapSeries -l
with no errors. Then with bash metaGEM.sh -t binRefine -l
i have a different (maybe more helpful?) error, see below. Just FYI running -l
does produce a log file, but it is much shorter that this output.
(metagem) dail@swamp:~/metaGEM/workflow$ bash metaGEM.sh -t binRefine -l
================================================================================================================================= Developed by: Francisco Zorrilla, Kiran R. Patil, and Aleksej Zelezniak____ Publication: doi.org/10.1101/2020.12.31.424982/\\\/\\\\/\____/\_
____/\//////////\/\///////////\/\_/\\
__/\____/\__\/\__\/\//\/\//\
____/\_/\____/\\____/\\\/\\_\/\/\\\/\\\__\/\///\/\/\/\
/\///\\///\/\/////\\////\////_\////////\_\/\\/////\\/\///////__\/\__\///\/\/\_
__\/\\//\_\/\/\\\_____\/\___/\\_\/\_____\/\\/\__\/\_\///\/\_
\/\_\/\\/\\//\///////_\/\/_/\/////\\/\__\/\\/\__\/\___\/\
____\/\_\/\\/\_\//\\_\//\__\//\\/_\//\\\/\/\\\\\/\___\/\ ____\///\///\///_\//////////___\/////__\////////\//_\////////////____\///////////////_\///__\///__A Snakemake-based pipeline desinged to predict metabolic interactions directly from metagenomics data using high performance computer clusters
Version: 1.0.5
Setting current directory to root in config.yaml file ...
Parsing Snakefile to target rule: binRefine ...
Do you wish to continue with these parameters? (y/n)y Proceeding with binRefine job(s) ...
Please verify parameters set in the config.yaml file:
path: root: /home/dail/metaGEM/workflow scratch: /home/dail/metaGEM/workflow/tmp folder: data: dataset logs: logs assemblies: assemblies scripts: scripts crossMap: crossMap concoct: concoct maxbin: maxbin metabat: metabat refined: refined_bins reassembled: reassembled_bins classification: GTDBTk abundance: abundance GRiD: GRiD GEMs: GEMs SMETANA: SMETANA memote: memote qfiltered: qfiltered stats: stats proteinBins: protein_bins dnaBins: dna_bins pangenome: pangenome kallisto: kallisto kallistoIndex: kallistoIndex benchmarks: benchmarks prodigal: prodigal blastp: blastp blastp_db: blastp_db scripts: kallisto2concoct: kallisto2concoct.py prepRoary: prepareRoaryInput.R binFilter: binFilter.py qfilterVis: qfilterVis.R assemblyVis: assemblyVis.R binningVis: binningVis.R modelVis: modelVis.R compositionVis: compositionVis.R taxonomyVis: taxonomyVis.R carveme: media_db.tsv toy: download_toydata.txt GTDBtkVis: cores: fastp: 4 megahit: 12 crossMap: 12 concoct: 12 metabat: 12 maxbin: 12 refine: 12 reassemble: 12 classify: 2 gtdbtk: 12 abundance: 12 carveme: 4 smetana: 12 memote: 4 grid: 12 prokka: 2 roary: 12 diamond: 12 params: cutfasta: 10000 assemblyPreset: meta-sensitive assemblyMin: 1000 concoct: 800 metabatMin: 50000 seed: 420 minBin: 1500 refineMem: 1600 refineComp: 50 refineCont: 10 reassembleMem: 1600 reassembleComp: 50 reassembleCont: 10 carveMedia: M8 smetanaMedia: M1,M2,M3,M4,M5,M7,M8,M9,M10,M11,M13,M14,M15A,M15B,M16 smetanaSolver: CPLEX roaryI: 90 roaryCD: 90 envs: metagem: /home/dail/metaGEM/workflow/envs/metagem metawrap: /home/dail/metaGEM/workflow/envs/metawrap prokkaroary: /home/dail/metaGEM/workflow/envs/prokkaroary
Please pay close attention to make sure that your paths are properly configured! Do you wish to proceed with this config.yaml file? (y/n)y
Unlocking snakemake ... Unlocking working directory.
Dry-running snakemake jobs ... Building DAG of jobs... Job counts: count jobs 1 all 1 binRefine 1 concoct 1 maxbinCross 1 metabatCross 5
[Wed Aug 30 09:50:51 2023] rule maxbinCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins jobid: 7 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:51 2023] rule concoct: input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins jobid: 2 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:51 2023] rule metabatCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/metabat/sample1/cov output: /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins jobid: 5 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.metabat.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:51 2023] rule binRefine: input: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins, /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins, /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins output: /home/dail/metaGEM/workflow/refined_bins/sample1 jobid: 1 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.binRefine.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:51 2023] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.
Job counts: count jobs 1 all 1 binRefine 1 concoct 1 maxbinCross 1 metabatCross 5 This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. Do you wish to submit this batch of jobs on your local machine? (y/n)y Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Conda environments: ignored Job counts: count jobs 1 all 1 binRefine 1 concoct 1 maxbinCross 1 metabatCross 5 Select jobs to execute... Failed to solve scheduling problem with ILP solver. Falling back to greedy solver. Run Snakemake with --verbose to see the full solver output for debugging the problem.
[Wed Aug 30 09:50:53 2023] rule maxbinCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins jobid: 7 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:53 2023] rule concoct: input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins jobid: 2 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:53 2023] rule metabatCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/metabat/sample1/cov output: /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins jobid: 5 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.metabat.benchmark.txt wildcards: IDs=sample1
Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/sample1 ...
Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/sample1 ...
Creating temporary directory /home/dail/metaGEM/workflow/tmp/metabat/sample1 ... Unzipping assembly ...
Unzipping assembly ... Done. Cutting up contigs (default 10kbp chunks) ...
Running metabat2 ... MetaBAT 2 (2.15 (Bioconda)) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges 200 and minClsSize 50000. with random seed=420 [00:00:00] Executing with 1 threads [00:00:00] Parsing abundance file [00:00:00] Parsing assembly file
Generating list of depth files based on crossMapSeries rule output ...
Running maxbin2 ... Can't locate LWP/Simple.pm in @INC (you may need to install the LWP::Simple module) (@INC contains: /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36 /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36 /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36) at /home/dail/metaGEM/workflow/envs/metagem/bin/run_MaxBin.pl line 4. BEGIN failed--compilation aborted at /home/dail/metaGEM/workflow/envs/metagem/bin/run_MaxBin.pl line 4. [00:00:00] Number of large contigs >= 1500 are 3667. [00:00:00] Reading abundance file [Wed Aug 30 09:50:53 2023] Error in rule maxbinCross: jobid: 7 output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins shell:
# Activate metagem environment #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u; # Create output folder mkdir -p $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins) # Make job specific scratch dir fsampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz))) echo -e "
Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} ... " mkdir -p /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}
# Move into scratch dir cd /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} # Copy files to tmp cp -r /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/maxbin/sample1/cov/*.depth . echo -e "
Unzipping assembly ... " gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)
echo -e "
Generating list of depth files based on crossMapSeries rule output ... " find . -name "*.depth" > abund.list
echo -e "
Running maxbin2 ... " run_MaxBin.pl -thread 12 -contig contigs.fasta -out $(basename $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)) -abund_list abund.list
# Clean up un-needed files rm *.depth abund.list contigs.fasta # Move files into output dir mkdir -p $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins) while read bin;do mv $bin $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins);done< <(ls|grep fasta) mv * $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins) (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Job failed, going on with independent jobs. [00:00:00] Finished reading 7479 contigs and 1 coverages from sample1.all.depth [00:00:00] Number of target contigs: 3667 of large (>= 1500) and 3812 of small ones (>=1000 & <1500). [00:00:00] Start TNF calculation. nobs = 3667 [00:00:00] Finished TNF calculation.
Running CONCOCT ... Up and running. Check /home/dail/metaGEM/workflow/tmp/concoct/sample1/sample1_log.txt for progress [00:00:02] Preparing TNF Graph Building [pTNF = 99.9; 0 / 3667 (P = 0.00%) round 1] [00:00:02] Preparing TNF Graph Building [pTNF = 99.8; 6 / 3667 (P = 0.16%) round 2] [00:00:02] Preparing TNF Graph Building [pTNF = 99.7; 24 / 3667 (P = 0.65%) round 3] [00:00:02] Preparing TNF Graph Building [pTNF = 99.5; 44 / 3667 (P = 1.20%) round 4] [00:00:02] Preparing TNF Graph Building [pTNF = 99.4; 60 / 3667 (P = 1.64%) round 5] [00:00:02] Preparing TNF Graph Building [pTNF = 99.2; 100 / 3667 (P = 2.73%) round 6] [00:00:02] Preparing TNF Graph Building [pTNF = 99.0; 140 / 3667 (P = 3.82%) round 7] [00:00:02] Preparing TNF Graph Building [pTNF = 98.7; 198 / 3667 (P = 5.40%) round 8] [00:00:02] Preparing TNF Graph Building [pTNF = 98.4; 235 / 3667 (P = 6.41%) round 9] [00:00:02] Preparing TNF Graph Building [pTNF = 98.0; 274 / 3667 (P = 7.47%) round 10] [00:00:02] Preparing TNF Graph Building [pTNF = 97.6; 339 / 3667 (P = 9.24%) round 11] [00:00:02] Preparing TNF Graph Building [pTNF = 97.2; 379 / 3667 (P = 10.34%) round 12] [00:00:02] Preparing TNF Graph Building [pTNF = 96.8; 442 / 3667 (P = 12.05%) round 13] [00:00:02] Preparing TNF Graph Building [pTNF = 96.5; 489 / 3667 (P = 13.34%) round 14] [00:00:02] Preparing TNF Graph Building [pTNF = 96.2; 526 / 3667 (P = 14.34%) round 15] [00:00:02] Preparing TNF Graph Building [pTNF = 95.9; 581 / 3667 (P = 15.84%) round 16] [00:00:02] Preparing TNF Graph Building [pTNF = 95.6; 625 / 3667 (P = 17.04%) round 17] [00:00:02] Preparing TNF Graph Building [pTNF = 95.1; 716 / 3667 (P = 19.53%) round 18] [00:00:02] Preparing TNF Graph Building [pTNF = 94.8; 787 / 3667 (P = 21.46%) round 19] [00:00:02] Preparing TNF Graph Building [pTNF = 94.4; 872 / 3667 (P = 23.78%) round 20] [00:00:02] Preparing TNF Graph Building [pTNF = 94.1; 940 / 3667 (P = 25.63%) round 21] [00:00:02] Preparing TNF Graph Building [pTNF = 93.6; 1023 / 3667 (P = 27.90%) round 22] [00:00:02] Preparing TNF Graph Building [pTNF = 93.3; 1069 / 3667 (P = 29.15%) round 23] [00:00:02] Preparing TNF Graph Building [pTNF = 92.9; 1148 / 3667 (P = 31.31%) round 24] [00:00:02] Preparing TNF Graph Building [pTNF = 92.4; 1269 / 3667 (P = 34.61%) round 25] [00:00:02] Preparing TNF Graph Building [pTNF = 91.9; 1378 / 3667 (P = 37.58%) round 26] [00:00:02] Preparing TNF Graph Building [pTNF = 91.5; 1467 / 3667 (P = 40.01%) round 27] [00:00:02] Preparing TNF Graph Building [pTNF = 91.0; 1559 / 3667 (P = 42.51%) round 28] [00:00:02] Preparing TNF Graph Building [pTNF = 90.5; 1656 / 3667 (P = 45.16%) round 29] [00:00:02] Preparing TNF Graph Building [pTNF = 90.0; 1770 / 3667 (P = 48.27%) round 30] [00:00:02] Preparing TNF Graph Building [pTNF = 88.9; 1820 / 3667 (P = 49.63%) round 31] [00:00:02] Preparing TNF Graph Building [pTNF = 88.0; 1879 / 3667 (P = 51.24%) round 32] [00:00:02] Preparing TNF Graph Building [pTNF = 86.9; 1960 / 3667 (P = 53.45%) round 33] [00:00:02] Preparing TNF Graph Building [pTNF = 86.0; 2045 / 3667 (P = 55.77%) round 34] [00:00:02] Preparing TNF Graph Building [pTNF = 84.9; 2154 / 3667 (P = 58.74%) round 35] [00:00:02] Preparing TNF Graph Building [pTNF = 84.0; 2227 / 3667 (P = 60.73%) round 36] [00:00:02] Preparing TNF Graph Building [pTNF = 82.9; 2330 / 3667 (P = 63.54%) round 37] [00:00:02] Preparing TNF Graph Building [pTNF = 81.9; 2413 / 3667 (P = 65.80%) round 38] [00:00:02] Preparing TNF Graph Building [pTNF = 81.0; 2505 / 3667 (P = 68.31%) round 39] [00:00:02] Preparing TNF Graph Building [pTNF = 79.9; 2582 / 3667 (P = 70.41%) round 40] [00:00:02] Preparing TNF Graph Building [pTNF = 79.0; 2649 / 3667 (P = 72.24%) round 41] [00:00:02] Preparing TNF Graph Building [pTNF = 78.1; 2717 / 3667 (P = 74.09%) round 42] [00:00:02] Preparing TNF Graph Building [pTNF = 77.1; 2785 / 3667 (P = 75.95%) round 43] [00:00:02] Preparing TNF Graph Building [pTNF = 76.0; 2873 / 3667 (P = 78.35%) round 44] [00:00:02] Preparing TNF Graph Building [pTNF = 75.1; 2915 / 3667 (P = 79.49%) round 45] [00:00:02] Preparing TNF Graph Building [pTNF = 74.0; 2991 / 3667 (P = 81.57%) round 46] [00:00:02] Preparing TNF Graph Building [pTNF = 73.0; 3031 / 3667 (P = 82.66%) round 47] [00:00:02] Preparing TNF Graph Building [pTNF = 72.1; 3082 / 3667 (P = 84.05%) round 48] [00:00:02] Preparing TNF Graph Building [pTNF = 71.2; 3130 / 3667 (P = 85.36%) round 49] [00:00:03] Preparing TNF Graph Building [pTNF = 70.1; 3175 / 3667 (P = 86.58%) round 50] [00:00:03] Finished Preparing TNF Graph Building [pTNF = 69.20]
[00:00:03] Building TNF Graph 4.9% (178 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 9.7% (356 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 14.6% (534 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 19.4% (712 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 24.3% (890 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 29.1% (1068 of 3667), ETA 0:00:02 [63.8Gb / 125.5Gb] [00:00:03] Building TNF Graph 34.0% (1246 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 38.8% (1424 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 43.7% (1602 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 48.5% (1780 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] [00:00:04] Building TNF Graph 53.4% (1958 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] [00:00:04] Building TNF Graph 58.2% (2136 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 63.1% (2314 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 68.0% (2492 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 72.8% (2670 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] [00:00:04] Building TNF Graph 77.7% (2848 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] Traceback (most recent call last): File "/home/dail/metaGEM/workflow/envs/metagem/bin/concoct", line 90, inresults = main(args) File "/home/dail/metaGEM/workflow/envs/metagem/bin/concoct", line 37, in main transform_filter, pca = perform_pca( File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/concoct/transform.py", line 5, in perform_pca pca_object = PCA(n_components=nc, random_state=seed).fit(d) File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/base.py", line 1151, in wrapper return fit_method(estimator, *args, **kwargs) File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 434, in fit self._fit(X) File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 483, in _fit X = self._validate_data( File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/base.py", line 579, in _validate_data self._check_feature_names(X, reset=reset) File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/base.py", line 440, in _check_feature_names feature_names_in = _get_feature_names(X) File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/utils/validation.py", line 2021, in _get_feature_names raise TypeError( TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type. [00:00:04] Building TNF Graph 82.5% (3026 of 3667), ETA 0:00:00 [63.8Gb / 125.5Gb] [Wed Aug 30 09:50:58 2023] Error in rule concoct: jobid: 2 output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins shell: # Activate metagem environment #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u; # Create output folder mkdir -p $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) # Make job specific scratch dir sampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz))) echo -e "
Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} ... " mkdir -p /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}
# Move into scratch dir cd /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} # Copy files cp /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv . echo "Unzipping assembly ... " gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz) echo -e "Done.
Cutting up contigs (default 10kbp chunks) ... " cut_up_fasta.py -c 10000 -o 0 -m $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') > assembly_c10k.fa
echo -e "
Running CONCOCT ... " concoct --coverage_file $(basename /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv) --composition_file assembly_c10k.fa -b $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)) -t 12 -c 800
echo -e "
Merging clustering results into original contigs ... " merge_cutup_clustering.py $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_gt1000.csv > $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv
echo -e "
Extracting bins ... " mkdir -p $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) extract_fasta_bins.py $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv --output_path $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
# Move final result files to output folder mv $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) *.txt *.csv $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Job failed, going on with independent jobs. [00:00:05] Building TNF Graph 87.4% (3204 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Building TNF Graph 92.2% (3382 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Building TNF Graph 97.1% (3560 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Building TNF Graph 101.9% (3738 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Finished Building TNF Graph (81835 edges) [63.7Gb / 125.5Gb]
[00:00:05] Applying coverage correlations to TNF graph with 81835 edges [00:00:05] Traversing graph with 3667 nodes and 81835 edges [00:00:05] Building SCR Graph and Binning (349 vertices and 964 edges) [P = 9.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (697 vertices and 2061 edges) [P = 19.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (1046 vertices and 2821 edges) [P = 28.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (1394 vertices and 3865 edges) [P = 38.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (1742 vertices and 5005 edges) [P = 47.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2091 vertices and 6557 edges) [P = 57.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2439 vertices and 8265 edges) [P = 66.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2787 vertices and 11709 edges) [P = 76.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2988 vertices and 16894 edges) [P = 85.50%; 63.7Gb / 125.5Gb]
[00:00:05] Rescuing singleton large contigs [00:00:05] There are 16 bins already [00:00:05] Outputting bins [00:00:05] 79.68% (11949444 bases) of large (>=1500) and 0.00% (0 bases) of small (<1500) contigs were binned. 16 bins (11949444 bases in total) formed. [00:00:05] Finished [Wed Aug 30 09:50:59 2023] Finished job 5. 1 of 5 steps (20%) done Exiting because a job execution failed. Look above for error message Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-30T095053.149690.snakemake.log
This is the log file
Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Conda environments: ignored Job counts: count jobs 1 all 1 binRefine 1 concoct 1 maxbinCross 1 metabatCross 5 Select jobs to execute... Failed to solve scheduling problem with ILP solver. Falling back to greedy solver. Run Snakemake with --verbose to see the full solver output for debugging the problem.
[Wed Aug 30 09:50:53 2023] rule maxbinCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins jobid: 7 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:53 2023] rule concoct: input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins jobid: 2 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:53 2023] rule metabatCross: input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/metabat/sample1/cov output: /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins jobid: 5 benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.metabat.benchmark.txt wildcards: IDs=sample1
[Wed Aug 30 09:50:53 2023] Error in rule maxbinCross: jobid: 7 output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins shell:
# Activate metagem environment #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u; # Create output folder mkdir -p $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins) # Make job specific scratch dir fsampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz))) echo -e "
Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} ... " mkdir -p /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}
# Move into scratch dir cd /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} # Copy files to tmp cp -r /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/maxbin/sample1/cov/*.depth . echo -e "
Unzipping assembly ... " gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)
echo -e "
Generating list of depth files based on crossMapSeries rule output ... " find . -name "*.depth" > abund.list
echo -e "
Running maxbin2 ... " run_MaxBin.pl -thread 12 -contig contigs.fasta -out $(basename $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)) -abund_list abund.list
# Clean up un-needed files rm *.depth abund.list contigs.fasta # Move files into output dir mkdir -p $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins) while read bin;do mv $bin $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins);done< <(ls|grep fasta) mv * $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins) (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Job failed, going on with independent jobs. [Wed Aug 30 09:50:58 2023] Error in rule concoct: jobid: 2 output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins shell:
# Activate metagem environment #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u; # Create output folder mkdir -p $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) # Make job specific scratch dir sampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz))) echo -e "
Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} ... " mkdir -p /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}
# Move into scratch dir cd /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} # Copy files cp /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv . echo "Unzipping assembly ... " gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz) echo -e "Done.
Cutting up contigs (default 10kbp chunks) ... " cut_up_fasta.py -c 10000 -o 0 -m $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') > assembly_c10k.fa
echo -e "
Running CONCOCT ... " concoct --coverage_file $(basename /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv) --composition_file assembly_c10k.fa -b $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)) -t 12 -c 800
echo -e "
Merging clustering results into original contigs ... " merge_cutup_clustering.py $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_gt1000.csv > $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv
echo -e "
Extracting bins ... " mkdir -p $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) extract_fasta_bins.py $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv --output_path $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
# Move final result files to output folder mv $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) *.txt *.csv $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Job failed, going on with independent jobs. [Wed Aug 30 09:50:59 2023] Finished job 5. 1 of 5 steps (20%) done Exiting because a job execution failed. Look above for error message Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-30T095053.149690.snakemake.log
Can't locate LWP/Simple.pm in @inc (you may need to install the LWP::Simple module) (@inc contains: /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36 /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36 /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36) at /home/dail/metaGEM/workflow/envs/metagem/bin/run_MaxBin.pl line 4.
Regarding the above maxbin errors, looks like you are missing some perl library. Try could try and install it manually. Looks like others have reported this error message, maybe read throught this issue here: https://github.com/metagenome-atlas/atlas/issues/328
raise TypeError( TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types.
Regarding the above concoct errors, looks like it has to do with the sklearn version, see these issues for more details on how to solve this https://github.com/BinPro/CONCOCT/issues/323 https://github.com/BinPro/CONCOCT/issues/322
Regarding metabat2, from the logs it looks like it may have succesfully generated some draft bins.
After fixing the issues with concoct and maxbin then you should be able to run the refinement and reassembly modules. Remember to try cleaning up the tmp dir in between runs to avoid any existing intermediate file issues that may cause problems. Let me know if this helps.
Best, Francisco
Hi, I'm using pbs to submit jobs to the server, but as with the command line alone, I have to use the "--local" flag, otherwise I get "This was a dry-run (flag -n). The order of jobs does not reflect the order of execution." and no program will run. Here is my PBS command, I have to specify the "--local" flag:
projectname="metaGEM_comand" project="/public/home/wangjj/WP/metaGEM/workflow"
cd $project
echo -e "" echo -e "" source activate metagem
yes | bash metaGEM.sh -t fastp -j 2 -c 120 -m 500 -h 24 --local echo -e "****" echo -e "**** fastp Finish...!!! ****" echo -e "****" yes | bash metaGEM.sh -t megahit -j 2 -c 120 -m 500 -h 24 --local echo -e "****" echo -e "**** megahit Finish...!!! ****" echo -e "****" yes | bash metaGEM.sh -t crossMapSeries -j 2 -c 120 -m 500 -h 24 --local echo -e "****" echo -e "**** crossMapSeries Finish...!!! ****" echo -e "****" yes | bash metaGEM.sh -t concoct -j 2 -c 120 -m 500 -h 24 --local
In addition to that, I also found that when using "--local", it seems that no matter how many nodes are applied to the server, the program will only run on one node, and not multiple nodes at the same time in parallel, because I got an out of memory error when I ran GTDB-TK, but in fact, I applied 10 24-core 96GB nodes when I ran GTDB-TK with "--local", and it shouldn't have generated this error: [2023-10-17 15:35:05] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs. [2023-10-17 15:35:05] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2023-10-17 15:35:05] INFO: Creating concatenated alignment for 45,560 bacterial GTDB and user genomes. [2023-10-17 15:35:06] INFO: Creating concatenated alignment for 5 bacterial user genomes. [2023-10-17 15:35:06] INFO: Done. [2023-10-17 15:35:06] WARNING: pplacer requires ~215 GB of RAM to fully load the bacterial tree into memory. However, 65.44 GB was detected. This may affect pplacer performance, or fail if there is insufficient swap space. [2023-10-17 15:35:06] TASK: Placing 5 bacterial genomes into reference tree with pplacer using 48 CPUs (be patient). [2023-10-17 15:35:06] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
@franciscozorrilla Do you know how to solve this problem?
@franciscozorrilla I think it might be a problem with the conditional judgment here, but I don't know why I get this message when I don't set the "--local" flag: "This was a dry-run (flag -n). The order of jobs does not reflect the order of execution."
In,metagem.sh
elif [ $task == "fastp" ]; then string='expand(config["path"]["root"]+"/"+config["folder"]["qfiltered"]+"/{IDs}/{IDs}_R1.fastq.gz", IDs = IDs)' if [ $local == "true" ]; then submitLocal else submitCluster fi
Hey Wupeng,
but as with the command line alone, I have to use the "--local" flag
I do not understand the reasoning here. Why are you trying to use the local flag when submitting jobs to the cluster? In general, you should never run jobs locally on the cluster. The local flag is only for usage with workstations, where there is no scheduler or other users.
In addition to that, I also found that when using "--local", it seems that no matter how many nodes are applied to the server, the program will only run on one node, and not multiple nodes at the same time in parallel, because I got an out of memory error when I ran GTDB-TK, but in fact, I applied 10 24-core 96GB nodes when I ran GTDB-TK with "--local", and it shouldn't have generated this error:
This behavior is exactly as expected. To submit jobs to the cluster, instead of running them locally, remove the --local flag. In fact, if you add the local flag, Snakemake will launch the jobs directly in the node you are running and that is why you are running out of memory.
I'm using pbs to submit jobs to the server
If you are using a pbs cluster then have a look a this discussion and this fork. @fbartusch modified the metaGEM.sh wrapper file and cluster config file to allow submission on qsub, if I were you I would look at those modifications and apply them to your files as well
Hope this helps and let me know if you have further questions! Best, Francisco
p.s. feel free to open up a new issue/discussion :)
Hey Wupeng,
but as with the command line alone, I have to use the "--local" flag
I do not understand the reasoning here. Why are you trying to use the local flag when submitting jobs to the cluster? In general, you should never run jobs locally on the cluster. The local flag is only for usage with workstations, where there is no scheduler or other users.
In addition to that, I also found that when using "--local", it seems that no matter how many nodes are applied to the server, the program will only run on one node, and not multiple nodes at the same time in parallel, because I got an out of memory error when I ran GTDB-TK, but in fact, I applied 10 24-core 96GB nodes when I ran GTDB-TK with "--local", and it shouldn't have generated this error:
This behavior is exactly as expected. To submit jobs to the cluster, instead of running them locally, remove the --local flag. In fact, if you add the local flag, Snakemake will launch the jobs directly in the node you are running and that is why you are running out of memory.
I'm using pbs to submit jobs to the server
If you are using a pbs cluster then have a look a this discussion and this fork. @fbartusch modified the metaGEM.sh wrapper file and cluster config file to allow submission on qsub, if I were you I would look at those modifications and apply them to your files as well
Hope this helps and let me know if you have further questions! Best, Francisco
p.s. feel free to open up a new issue/discussion :)
Thanks for your reply, I did find this problem too, when I unchecked "--local", the nohup.out file mentions that the "sbatch" command can't be found, so as you mentioned, it's indeed a problem with qsub! I will take the next step as you suggested, thanks again!
Hello, when I set qsub like this, the following error occurred: Tue Oct 24 23:18:39 2023] rule binReassemble: input: /public/home/wangjj/WP/metaGEM/workflow/qfiltered/mergedYL/mergedYL_R1.fastq.gz, /public/home/wangjj/WP/metaGEM/workflow/qfiltered/mergedYL/mergedYL_R2.fastq.gz, /public/home/wangjj/WP/metaGEM/workflow/refined_bins/mergedYL output: /public/home/wangjj/WP/metaGEM/workflow/reassembled_bins/mergedYL jobid: 1 benchmark: /public/home/wangjj/WP/metaGEM/workflow/benchmarks/mergedYL.binReassemble.benchmark.txt wildcards: IDs=mergedYL
RuleException in line 791 of /public/home/wangjj/WP/metaGEM/workflow/Snakefile: IndexError: tuple index out of range File "/public/home/wangjj/WP/metaGEM/workflow/envs/metagem/lib/python3.10/site-packages/snakemake/executors/init.py", line 136, in run_jobs File "/public/home/wangjj/WP/metaGEM/workflow/envs/metagem/lib/python3.10/site-packages/snakemake/executors/init.py", line 969, in run
Hi wupeng, it sounds like your issues are no longer related to the original post. Please feel free to open up a new issue and provide more details. Based on the error message, it seems like Snakemake is not properly communicating with your cluster. First make sure that Snakemake is working properly, you should be able to submit/run simple Snakemake jobs on your cluster before trying to use metaGEM. For reference, please see the Snakemake docs and tutorial.
Ok,i know what you means. But the snakemake is work well when i use the --local flag.so i think there maybe not the snakemake false
Please, do not run jobs on the login node of your cluster using the --local flag. This is improper and harmful usage of the cluster, and you will probably get complaints from your HPC admins. Snakemake is supposed to communicate with your HPC job scheduler and submit jobs to compute nodes, as opposed to launching them on the login node which is what you are doing with the --local flag.
Hi Francisco,
I recently downloaded metagem, I used the manual installation guideline. I'm trying to run fastp but keep getting this error. I am unsure what could be causing this. I was able to run createFolders, downloadToy, organizeData, and check with no problems. I am using sample1 from the toy dataset.
(metagem) dail@swamp:~/metaGEM/workflow$ bash /home/dail/metaGEM/workflow/metaGEM.sh -t fastp -l
================================================================================================================================= Developed by: Francisco Zorrilla, Kiran R. Patil, and Aleksej Zelezniak____ Publication: doi.org/10.1101/2020.12.31.424982/\\\/\\\\/\____/\_
____/\//////////\/\///////////\/\_/\\
__/\____/\__\/\__\/\//\/\//\
____/\_/\____/\\____/\\\/\\_\/\/\\\/\\\__\/\///\/\/\/\
/\///\\///\/\/////\\////\////_\////////\_\/\\/////\\/\///////__\/\__\///\/\/\_
__\/\\//\_\/\/\\\_____\/\___/\\_\/\_____\/\\/\__\/\_\///\/\_
\/\_\/\\/\\//\///////_\/\/_/\/////\\/\__\/\\/\__\/\___\/\
____\/\_\/\\/\_\//\\_\//\__\//\\/_\//\\\/\/\\\\\/\___\/\ ____\///\///\///_\//////////___\/////__\////////\//_\////////////____\///////////////_\///__\///__
A Snakemake-based pipeline desinged to predict metabolic interactions directly from metagenomics data using high performance computer clusters
Version: 1.0.5
Setting current directory to root in config.yaml file ...
Parsing Snakefile to target rule: fastp ...
Do you wish to continue with these parameters? (y/n)y Proceeding with fastp job(s) ...
Please verify parameters set in the config.yaml file:
path: root: /home/dail/metaGEM/workflow scratch: $TMP folder: data: dataset logs: logs assemblies: assemblies scripts: scripts crossMap: crossMap concoct: concoct maxbin: maxbin metabat: metabat refined: refined_bins reassembled: reassembled_bins classification: GTDBTk abundance: abundance GRiD: GRiD GEMs: GEMs SMETANA: SMETANA memote: memote qfiltered: qfiltered stats: stats proteinBins: protein_bins dnaBins: dna_bins pangenome: pangenome kallisto: kallisto kallistoIndex: kallistoIndex benchmarks: benchmarks prodigal: prodigal blastp: blastp blastp_db: blastp_db scripts: kallisto2concoct: kallisto2concoct.py prepRoary: prepareRoaryInput.R binFilter: binFilter.py qfilterVis: qfilterVis.R assemblyVis: assemblyVis.R binningVis: binningVis.R modelVis: modelVis.R compositionVis: compositionVis.R taxonomyVis: taxonomyVis.R carveme: media_db.tsv toy: download_toydata.txt GTDBtkVis: cores: fastp: 4 megahit: 48 crossMap: 48 concoct: 48 metabat: 48 maxbin: 48 refine: 48 reassemble: 48 classify: 2 gtdbtk: 48 abundance: 16 carveme: 4 smetana: 12 memote: 4 grid: 24 prokka: 2 roary: 12 diamond: 12 params: cutfasta: 10000 assemblyPreset: meta-sensitive assemblyMin: 1000 concoct: 800 metabatMin: 50000 seed: 420 minBin: 1500 refineMem: 1600 refineComp: 50 refineCont: 10 reassembleMem: 1600 reassembleComp: 50 reassembleCont: 10 carveMedia: M8 smetanaMedia: M1,M2,M3,M4,M5,M7,M8,M9,M10,M11,M13,M14,M15A,M15B,M16 smetanaSolver: CPLEX roaryI: 90 roaryCD: 90 envs: metagem: envs/metagem metawrap: envs/metawrap prokkaroary: envs/prokkaroary
Please pay close attention to make sure that your paths are properly configured! Do you wish to proceed with this config.yaml file? (y/n)y
Unlocking snakemake ... Unlocking working directory.
Dry-running snakemake jobs ... Building DAG of jobs... Job counts: count jobs 1 all 1 qfilter 2
[Mon Aug 28 20:08:14 2023] rule qfilter: input: /home/dail/metaGEM/workflow/dataset/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/dataset/sample1/sample1_R2.fastq.gz output: /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz jobid: 1 wildcards: IDs=sample1
[Mon Aug 28 20:08:14 2023] Job 0: WARNING: Be very careful when adding/removing any lines above this message. The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly, therefore adding/removing any lines before this message will likely result in parser malfunction.
Job counts: count jobs 1 all 1 qfilter 2 This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. Do you wish to submit this batch of jobs on your local machine? (y/n)y Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Conda environments: ignored Job counts: count jobs 1 all 1 qfilter 2 Select jobs to execute...
[Mon Aug 28 20:08:16 2023] rule qfilter: input: /home/dail/metaGEM/workflow/dataset/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/dataset/sample1/sample1_R2.fastq.gz output: /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz jobid: 1 wildcards: IDs=sample1
Activating envs/metagem conda environment ... /usr/bin/bash: line 2: activate: No such file or directory [Mon Aug 28 20:08:16 2023] Error in rule qfilter: jobid: 1 output: /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz shell:
Creating temporary directory $TMP/qfiltered/${idvar} ... " mkdir -p $TMP/qfiltered/${idvar}
Job failed, going on with independent jobs. Exiting because a job execution failed. Look above for error message Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-28T200816.260324.snakemake.log