Open nalbright opened 6 years ago
I have a question about the Assembly workflow in the Really Quick Copy-and-Paste Quick Start:
I noticed that the input for the assembly is the same (raw data) that is called for the read filtering workflow. Shouldn't the input for assembly workflow be the output of the readfiltering workflow (i.e. filtered/trimmed reads rather than the raw reads)? The same raw data are also called for the inputs with the next workflows ( comparison, taxonomic classification)? Is this something that the user should be updating with each progressive workflow? Could this be a possible source of the error for that I am seeing above for the assembly?
I appreciate any clarification you can provide! Thanks, Nicolette
The input files are the same, but the trimming workflow is run on them prior to assembly. (If trimming has already happened, it is not re-run.)
You can kind of infer this from the 'qual' argument in the assembly workflow JSON, and also by watching what snakemake does, but we will make sure to add this to the docs.
(Snakemake says:
Job 1: --- Assembling quality trimmed reads with Megahit)
In order to diagnose the error, we might need the file data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.log
- could you paste that here? thx!
Thanks for the explanation! Before seeing your response I went ahead and executed assembly, but changed the input files from the copy and paste to the outputs of read trimming. It has been successfully running over night and is still running!
Here is the .log file that you requested above:
MEGAHIT v1.1.2 --- [Thu Jul 19 10:06:16 2018] Start assembly. Number of CPU threads 1 --- --- [Thu Jul 19 10:06:16 2018] Available memory: 8365150208, used: 1673030041 --- [Thu Jul 19 10:06:16 2018] Converting reads to binaries --- /usr/local/bin/megahit_asm_core buildlib /data/SRR606249_subset25.trim2_megahit/tmp/reads.lib /data/SRR606249_subset25.trim2_megahit/tmp/reads.lib b' [read_lib_functions-inl.h : 209] Lib 0 (/data/SRR606249_subset25_1.trim2.fq.gz,/data/SRR606249_subset25_2.trim2.fq.gz): pe, 26715952 reads, 101 max length' b' [utils.h : 126] Real: 72.5175\tuser: 25.3008\tsys: 9.7853\tmaxrss: 155044' --- [Thu Jul 19 10:07:28 2018] k-max reset to: 119 --- --- [Thu Jul 19 10:07:28 2018] k list: 21,29,39,59,79,99,119 --- --- [Thu Jul 19 10:07:28 2018] Extracting solid (k+1)-mers for k = 21 --- cmd: /usr/local/bin/megahit_sdbg_build count -k 21 -m 2 --host_mem 1673030041 --mem_flag 1 --gpu_mem 0 --output_prefix /data/SRR606249_subset25.trim2_megahit/tmp/k21/21 --num_cpu_threads 1 --num_output_threads 1 --read_lib_file /data/SRR606249_subset25.trim2_megahit/tmp/reads.lib b' [sdbg_builder.cpp : 112] Host memory to be used: 1673030041' b' [sdbg_builder.cpp : 113] Number CPU threads: 1' b' [cx1.h : 450] Preparing data...' b' [read_lib_functions-inl.h : 256] Before reading, sizeof seq_package: 885946936' b' [read_lib_functions-inl.h : 260] After reading, sizeof seq_package: 885946936' b' [cx1_kmer_count.cpp : 136] 26715952 reads, 101 max read length' b' [cx1.h : 457] Preparing data... Done. Time elapsed: 8.0859' b' [cx1.h : 464] Preparing partitions and initialing global data...' b' [cx1_kmer_count.cpp : 227] 2 words per substring, 2 words per edge' b' [cx1_kmer_count.cpp : 322] Set: 1145227092, 708974305' b' [cx1.h : 171] Adjusting memory layout: max_lv1_items=283796391, num_sorting_items=418397, mem_sorting_items=10041528, mem_avail=708974305' b' [cx1_kmer_count.cpp : 356] 174733194, 418397 708974304 708974305' b' [cx1_kmer_count.cpp : 363] Memory for reads: 906953816' b' [cx1_kmer_count.cpp : 364] max # lv.1 items = 174733194' b' [cx1.h : 480] Preparing partitions and initialing global data... Done. Time elapsed: 31.2946' b' [cx1.h : 486] Start main loop...' b' [cx1.h : 515] Lv1 scanning from bucket 0 to 1180' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 41.0832' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 58.6954' b' [cx1.h : 515] Lv1 scanning from bucket 1180 to 2972' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 40.1835' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 57.0502' b' [cx1.h : 515] Lv1 scanning from bucket 2972 to 5165' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 48.7228' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 63.3104' b' [cx1.h : 515] Lv1 scanning from bucket 5165 to 7735' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 40.4177' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 66.7352' b' [cx1.h : 515] Lv1 scanning from bucket 7735 to 10699' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 43.5897' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 68.2777' b' [cx1.h : 515] Lv1 scanning from bucket 10699 to 14101' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 46.8261' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 65.2864' b' [cx1.h : 515] Lv1 scanning from bucket 14101 to 18006' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 41.7479' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 66.8297' b' [cx1.h : 515] Lv1 scanning from bucket 18006 to 22532' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 43.2049' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 68.1576' b' [cx1.h : 515] Lv1 scanning from bucket 22532 to 27895' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 48.6676' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 70.1291' b' [cx1.h : 515] Lv1 scanning from bucket 27895 to 34492' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 44.5378' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 69.6958' b' [cx1.h : 515] Lv1 scanning from bucket 34492 to 43306' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 56.5795' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 77.2814' b' [cx1.h : 515] Lv1 scanning from bucket 43306 to 61704' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 59.7476' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 81.8533' b' [cx1.h : 515] Lv1 scanning from bucket 61704 to 65536' b' [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 32.5052' b' [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 4.0216' b' [cx1.h : 607] Main loop done. Time elapsed: 1405.1385' b' [cx1.h : 613] Postprocessing...' b' [cx1_kmer_count.cpp : 860] Total number of candidate reads: 311050(531785)' b' [cx1_kmer_count.cpp : 871] Total number of solid edges: 183916390' b' [cx1.h : 621] Postprocess done. Time elapsed: 0.3870' b' [utils.h : 126] Real: 1444.9483\tuser: 1424.9294\tsys: 8.4105\tmaxrss: 1794920' --- [Thu Jul 19 10:31:33 2018] Building graph for k = 21 --- /usr/local/bin/megahit_sdbg_build seq2sdbg --host_mem 1673030041 --mem_flag 1 --gpu_mem 0 --output_prefix /data/SRR606249_subset25.trim2_megahit/tmp/k21/21 --num_cpu_threads 1 -k 21 --kmer_from 0 --num_edge_files 1 --input_prefix /data/SRR606249_subset25.trim2_megahit/tmp/k21/21 --need_mercy b' [sdbg_builder.cpp : 339] Host memory to be used: 1673030041' b' [sdbg_builder.cpp : 340] Number CPU threads: 1' b' [cx1.h : 450] Preparing data...' b' [cx1_seq2sdbg.cpp : 394] Number edges: 183916390' b' [cx1_seq2sdbg.cpp : 434] Bases to reserve: 5057700714, number contigs: 0, number multiplicity: 229895487' b' [cx1_seq2sdbg.cpp : 440] Before reading, sizeof seq_package: 1264425188, multiplicity vector: 229895487' b' [cx1_seq2sdbg.cpp : 455] Adding mercy edges...' b' [cx1_seq2sdbg.cpp : 373] Number of reads: 311050, Number of mercy edges: 4387638' b' [cx1_seq2sdbg.cpp : 462] Done. Time elapsed: 39.0692' b' [cx1_seq2sdbg.cpp : 529] After reading, sizeof seq_package: 1264425188, multiplicity vector: 229895487' b' [ERROR] [cx1_seq2sdbg.cpp : 540]: 1673030041 bytes is not enough for CX1 sorting, please set -m parameter to at least 1759269939' Error occurs when running "builder build" for k = 21; please refer to /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.log for detail [Exit code 1]
cool, https://github.com/dahak-metagenomics/dahak/pull/115 fixes this! (I had the branch ready but had forgotten to make the PR!)
Re-kicked this off this morning and seems to be running fine with no errors so far! :) (already past the execution point where it error-ed out above)!
As per assembly workflow in Really Quick Copy-And-Paste Quick Start I copied the json file specified and execute the following which gave me the following error: $ export SINGULARITY_BINDPATH="data:/data" $ snakemake -p --use-singularity --configfile=config/custom_assembly_workflow.json assembly_workflow_all
Error:
Building DAG of jobs... Pulling singularity image docker://quay.io/biocontainers/spades:3.11.1--py27_zlib1.2.8_0. Pulling singularity image docker://quay.io/biocontainers/megahit:1.1.2--py35_0. Using shell: /bin/bash Provided cores: 1
Rules claiming more threads will be scaled down. Job counts: count jobs 4 assembly_megahit 4 assembly_metaspades 1 assembly_workflow_all 9
Job 6: --- Assembling quality trimmed reads with Megahit
rm -rf data/SRR606249_subset25.trim2_megahit && megahit -t 1 --memory 0.20 -1 /data/SRR606249_subset25_1.trim2.fq.gz -2 /data/SRR606249_subset25_2.trim2.fq.gz --out-prefix=SRR606249_subset25.trim2_megahit -o /data/SRR606249_subset25.trim2_megahit && mv /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.contigs.fa /data/SRR606249_subset25.trim2_megahit.contigs.fa Activating singularity image /home/user/dahak_2018/dahak/workflows/.snakemake/singularity/bfd669a63b585d366276296fdcd11501.simg 7.791Gb memory in total. Using: 1.558Gb. MEGAHIT v1.1.2 --- [Tue Jul 17 21:49:41 2018] Start assembly. Number of CPU threads 1 --- --- [Tue Jul 17 21:49:41 2018] Available memory: 8365150208, used: 1673030041 --- [Tue Jul 17 21:49:41 2018] Converting reads to binaries --- b' [read_lib_functions-inl.h : 209] Lib 0 (/data/SRR606249_subset25_1.trim2.fq.gz,/data/SRR606249_subset25_2.trim2.fq.gz): pe, 26715952 reads, 101 max length' b' [utils.h : 126] Real: 71.9618\tuser: 23.4359\tsys: 10.3257\tmaxrss: 155080' --- [Tue Jul 17 21:50:53 2018] k-max reset to: 119 --- --- [Tue Jul 17 21:50:53 2018] k list: 21,29,39,59,79,99,119 --- --- [Tue Jul 17 21:50:53 2018] Extracting solid (k+1)-mers for k = 21 --- --- [Tue Jul 17 22:13:00 2018] Building graph for k = 21 --- Error occurs when running "builder build" for k = 21; please refer to /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.log for detail [Exit code 1] Error in rule assembly_megahit: jobid: 6 output: data/SRR606249_subset25.trim2_megahit.contigs.fa log: data/SRR606249_subset25.trim2_megahit.log
RuleException: CalledProcessError in line 148 of /home/user/dahak_2018/dahak/workflows/assembly/Snakefile: Command 'singularity exec --home /home/user/dahak_2018/dahak/workflows /home/user/dahak_2018/dahak/workflows/.snakemake/singularity/bfd669a63b585d366276296fdcd11501.simg bash -c ' set -euo pipefail; rm -rf data/SRR606249_subset25.trim2_megahit && megahit -t 1 --memory 0.20 -1 /data/SRR606249_subset25_1.trim2.fq.gz -2 /data/SRR606249_subset25_2.trim2.fq.gz --out-prefix=SRR606249_subset25.trim2_megahit -o /data/SRR606249_subset25.trim2_megahit && mv /data/SRR606249_subset25.trim2_megahit/SRR606249_subset25.trim2_megahit.contigs.fa /data/SRR606249_subset25.trim2_megahit.contigs.fa '' returned non-zero exit status 1. File "/home/user/dahak_2018/dahak/workflows/assembly/Snakefile", line 148, in __rule_assembly_megahit File "/home/user/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/user/dahak_2018/dahak/workflows/.snakemake/log/2018-07-17T214845.783217.snakemake.log