I'm trying to assemble a yeast genome (roughly 13MB genome) with Illumina PE reads and everything seems to work fine until the Gap close step:
[jue jul 25 15:45:38 CEST 2019] Processing pe library reads
[jue jul 25 15:45:47 CEST 2019] Average PE read length 96
[jue jul 25 15:45:47 CEST 2019] Using kmer size of 31 for the graph
[jue jul 25 15:45:47 CEST 2019] MIN_Q_CHAR: 33
[jue jul 25 15:45:47 CEST 2019] Creating mer database for Quorum
[jue jul 25 15:46:00 CEST 2019] Error correct PE
[jue jul 25 15:46:49 CEST 2019] Estimating genome size
[jue jul 25 15:46:58 CEST 2019] Estimated genome size: 13966616
[jue jul 25 15:46:58 CEST 2019] Creating k-unitigs with k=31
[jue jul 25 15:47:59 CEST 2019] Computing super reads from PE
[jue jul 25 15:49:03 CEST 2019] Using linking mates
[jue jul 25 15:49:04 CEST 2019] Celera Assembler
[jue jul 25 15:58:30 CEST 2019] Overlap/unitig success
[jue jul 25 15:58:30 CEST 2019] Recomputing A-stat for super-reads
[jue jul 25 15:59:04 CEST 2019] Filtering overlaps
[jue jul 25 16:07:09 CEST 2019] Recomputing A-stat for super-reads
[jue jul 25 16:11:09 CEST 2019] CA success
[jue jul 25 16:11:09 CEST 2019] Gap closing
[jue jul 25 16:11:32 CEST 2019] Gap close failed, you can still use pre-gap close files under CA/9-terminator/. Check gapClose.err for problems.
I've googled 'std::bad_alloc" and it seems to be a memory problem. My computer is running Ubuntu 18 has 64GB RAM, 1TB SSD, Intel® Xeon(R) Silver 4114 CPU @ 2.20GHz × 20 so I'm not sure why it should have a memory problem.
You can re-generate assemble.sh file and edit the line that starts with closeGapsLocally.perl, change --max-reads-in-memory 1000000000 to --max-reads-in-memory 100000000, then run assemble.sh
Dear Masurca team,
I'm trying to assemble a yeast genome (roughly 13MB genome) with Illumina PE reads and everything seems to work fine until the Gap close step:
[jue jul 25 15:45:38 CEST 2019] Processing pe library reads [jue jul 25 15:45:47 CEST 2019] Average PE read length 96 [jue jul 25 15:45:47 CEST 2019] Using kmer size of 31 for the graph [jue jul 25 15:45:47 CEST 2019] MIN_Q_CHAR: 33 [jue jul 25 15:45:47 CEST 2019] Creating mer database for Quorum [jue jul 25 15:46:00 CEST 2019] Error correct PE [jue jul 25 15:46:49 CEST 2019] Estimating genome size [jue jul 25 15:46:58 CEST 2019] Estimated genome size: 13966616 [jue jul 25 15:46:58 CEST 2019] Creating k-unitigs with k=31 [jue jul 25 15:47:59 CEST 2019] Computing super reads from PE [jue jul 25 15:49:03 CEST 2019] Using linking mates [jue jul 25 15:49:04 CEST 2019] Celera Assembler [jue jul 25 15:58:30 CEST 2019] Overlap/unitig success [jue jul 25 15:58:30 CEST 2019] Recomputing A-stat for super-reads [jue jul 25 15:59:04 CEST 2019] Filtering overlaps [jue jul 25 16:07:09 CEST 2019] Recomputing A-stat for super-reads [jue jul 25 16:11:09 CEST 2019] CA success [jue jul 25 16:11:09 CEST 2019] Gap closing [jue jul 25 16:11:32 CEST 2019] Gap close failed, you can still use pre-gap close files under CA/9-terminator/. Check gapClose.err for problems.
When checking the gapClose.err file I see this:
mkdir CA/10-gapclose outputDirectory = CA/10-gapclose /home/puratos-l1/Bioinformatic_softwares/Assembly_Tools/MaSuRCA-3.3.3/bin/getEndSequencesOfContigs.perl /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/9-terminator 95 200 /home/puratos-l1/Bioinformatic_softwares/Assembly_Tools/MaSuRCA-3.3.3/bin/create_end_pairs.perl /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/9-terminator 95 > /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/10-gapclose/contig_end_pairs.95.fa /home/puratos-l1/Bioinformatic_softwares/Assembly_Tools/MaSuRCA-3.3.3/bin/create_end_pairs.perl /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/9-terminator 200 > /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/10-gapclose/contig_end_pairs.200.fa /home/puratos-l1/Bioinformatic_softwares/Assembly_Tools/MaSuRCA-3.3.3/bin/getMeanAndStdevForGapsByGapNumUsingCeleraAsmFile.perl /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/9-terminator --contig-end-seq-file /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/10-gapclose/contig_end_pairs.95.fa > gap.insertMeanAndStdev.txt echo "cc 600 200" > meanAndStdevByPrefix.cc.txt jellyfish count -s 200000000 -C -t 16 -m 21 -L 100 -o restrictKmers.jf /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/pe.renamed.fastq touch restrictKmers.success jellyfish dump -L 1000 restrictKmers.jf -c > highCountKmers.txt touch highCountKmers.success jellyfish count -s 330819 -C -t 16 -m 21 -o fishingAll.jf /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/10-gapclose/contig_end_pairs.200.fa jellyfish dump -c k_u_hash_localReadsFile_21_2_faux_reads.jf | awk 'BEGIN{n=0}{n++;print ">"n" length:21\n"$1}' > k_unitigs_localReadsFile_21_2_faux_reads.fa touch highCountKmers_elim.success /home/puratos-l1/Bioinformatic_softwares/Assembly_Tools/MaSuRCA-3.3.3/bin/createSuperReadsForDirectory.perl -mikedebug -noreduce -mean-and-stdev-by-prefix-file meanAndStdevByPrefix.cc.txt -minreadsinsuperread 1 -kunitigsfile k_unitigs_localReadsFile_21_2_faux_reads.fa -s 7023775 -low-memory -l 21 --stopAfter findReadKUnitigMatches -t 16 -mkudisr 0 workFauxVsFaux /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/10-gapclose/contig_end_pairs.200.fa 1>>out.localReadsFile_21_2_workFauxVsFaux 2>>out.localReadsFile_21_2_workFauxVsFaux /home/puratos-l1/Bioinformatic_softwares/Assembly_Tools/MaSuRCA-3.3.3/bin/createSuperReadsForDirectory.perl -mikedebug -noreduce -mean-and-stdev-by-prefix-file meanAndStdevByPrefix.cc.txt -minreadsinsuperread 1 -kunitigsfile k_unitigs_localReadsFile_21_2_faux_reads.fa -s 7023775 -low-memory -l 21 --stopAfter findReadKUnitigMatches -t 16 -mkudisr 0 workReadsVsFaux /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/pe.renamed.fastq 1>>out.localReadsFile_21_2_workReadsVsFaux 2>>out.localReadsFile_21_2_workReadsVsFaux /home/puratos-l1/Bioinformatic_softwares/Assembly_Tools/MaSuRCA-3.3.3/bin/collectReadSequencesForLocalGapClosing --faux-reads-file /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/CA/10-gapclose/contig_end_pairs.200.fa --faux-read-matches-to-kunis-file workFauxVsFaux/newTestOutput.nucmerLinesOnly --read-matches-to-kunis-file workReadsVsFaux/newTestOutput.nucmerLinesOnly --reads-file /home/puratos-l1/Assemblies/SunUP/Assemblies/Masurca_3/pe.renamed.fastq --max-reads-in-memory 1000000000 --dir-for-gaps . terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc child died with signal 6, with coredump
I've googled 'std::bad_alloc" and it seems to be a memory problem. My computer is running Ubuntu 18 has 64GB RAM, 1TB SSD, Intel® Xeon(R) Silver 4114 CPU @ 2.20GHz × 20 so I'm not sure why it should have a memory problem.
Any ideas on how to proceed?
Thanks