alekseyzimin / masurca

GNU General Public License v3.0
245 stars 35 forks source link

Gap consensus error, buffer overflow #87

Closed matthewmoscou closed 5 years ago

matthewmoscou commented 5 years ago

Dear @alekseyzimin,

We are assembling a heterozygous 800 Mb plant genome with 100x Illumina PE + 40x ONT data using MaSuRCA 3.3.0 on Ubuntu . We encountered the following error during the Gap consensus step. Having looked through issues raised by others, I believe this is a different error than described in Issue #58, as mummer appears to be installed correctly. Attempting the PATH specification by JFsanchezherrero in install.sh did not correct problem. The server it is running on was a completely clean machine.

[Tue Jan  1 20:48:32 UTC 2019] Processing pe library reads
[Tue Jan  1 20:48:32 UTC 2019] Average PE read length 150
[Tue Jan  1 20:48:34 UTC 2019] Using kmer size of 99 for the graph
[Tue Jan  1 20:48:34 UTC 2019] MIN_Q_CHAR: 33
[Tue Jan  1 20:48:34 UTC 2019] Estimated genome size: 1165707713
[Tue Jan  1 20:48:34 UTC 2019] Creating k-unitigs with k=99
[Tue Jan  1 23:53:51 UTC 2019] Computing super reads from PE
[Wed Jan  2 04:29:27 UTC 2019] Using CABOG from /home/ubuntu/canu/src/MaSuRCA-3.3.0/bin/../CA8/Linux-amd64/bin
[Wed Jan  2 04:29:27 UTC 2019] Running mega-reads correction/assembly
[Wed Jan  2 04:29:27 UTC 2019] Using mer size 15 for mapping, B=15, d=0.02
[Wed Jan  2 04:29:27 UTC 2019] Estimated Genome Size 1165707713
[Wed Jan  2 04:29:27 UTC 2019] Estimated Ploidy 1
[Wed Jan  2 04:29:27 UTC 2019] Using 96 threads
[Wed Jan  2 04:29:27 UTC 2019] Output prefix mr.41.15.15.0.02
[Wed Jan  2 04:29:27 UTC 2019] Using 25x of the longest ONT reads
[Wed Jan  2 04:37:33 UTC 2019] Reducing super-read k-mer size
[Wed Jan  2 06:26:55 UTC 2019] Mega-reads pass 1
[Wed Jan  2 06:26:55 UTC 2019] Running locally in 1 batch
Processed 500000 super reads, irreducible 423797, processing 264 super reads per second
Processed 1000000 super reads, irreducible 813442, processing 478 super reads per second
Processed 1500000 super reads, irreducible 1174760, processing 531 super reads per second
Processed 2000000 super reads, irreducible 1503316, processing 627 super reads per second
Processed 2500000 super reads, irreducible 1801805, processing 632 super reads per second
Processed 3000000 super reads, irreducible 2084976, processing 580 super reads per second
Processed 3500000 super reads, irreducible 2356970, processing 618 super reads per second
Processed 4000000 super reads, irreducible 2677953, processing 570 super reads per second
[Fri Jan  4 13:43:38 UTC 2019] Mega-reads pass 2
[Fri Jan  4 13:43:38 UTC 2019] Running locally in 1 batch
[Thu Jan 10 21:42:47 UTC 2019] Refining alignments
[Thu Jan 10 22:13:57 UTC 2019] Joining
[Thu Jan 10 23:37:37 UTC 2019] Gap consensus
*** buffer overflow detected ***: /home/ubuntu/canu/src/MaSuRCA-3.3.0/bin/ufasta terminated
/home/ubuntu/canu/src/MaSuRCA-3.3.0/bin/mega_reads_assemble_cluster.sh: line 659:  1657 Aborted                 (core dumped) $MYPATH/ufasta split -i refs.renamed.fa ${ref_names[@]}
Error: Failed to open reference file 'to_join.3.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
Error: Failed to open reference file 'to_join.1.fa'
alekseyzimin commented 5 years ago

Hi,

This is buffer overflow in ufasta split command. Can you check if everything compiled properly? Can you test "ufasta split" command, it simply splits the fasta file into several files.

Aleksey

On Thu, Jan 10, 2019, 7:01 PM Matthew Moscou <notifications@github.com wrote:

Dear @alekseyzimin https://github.com/alekseyzimin,

We are assembling a heterozygous 800 Mb plant genome with 100x Illumina PE

  • 40x ONT data using MaSuRCA 3.3.0 on Ubuntu . We encountered the following error during the Gap consensus step. Having looked through issues raised by others, I believe this is a different error than described in Issue #58 https://github.com/alekseyzimin/masurca/issues/58, as mummer appears to be installed correctly. Attempting the PATH specification by JFsanchezherrero in install.sh did not correct problem. The server it is running on was a completely clean machine.

[Tue Jan 1 20:48:32 UTC 2019] Processing pe library reads [Tue Jan 1 20:48:32 UTC 2019] Average PE read length 150 [Tue Jan 1 20:48:34 UTC 2019] Using kmer size of 99 for the graph [Tue Jan 1 20:48:34 UTC 2019] MIN_Q_CHAR: 33 [Tue Jan 1 20:48:34 UTC 2019] Estimated genome size: 1165707713 [Tue Jan 1 20:48:34 UTC 2019] Creating k-unitigs with k=99 [Tue Jan 1 23:53:51 UTC 2019] Computing super reads from PE [Wed Jan 2 04:29:27 UTC 2019] Using CABOG from /home/ubuntu/canu/src/MaSuRCA-3.3.0/bin/../CA8/Linux-amd64/bin [Wed Jan 2 04:29:27 UTC 2019] Running mega-reads correction/assembly [Wed Jan 2 04:29:27 UTC 2019] Using mer size 15 for mapping, B=15, d=0.02 [Wed Jan 2 04:29:27 UTC 2019] Estimated Genome Size 1165707713 [Wed Jan 2 04:29:27 UTC 2019] Estimated Ploidy 1 [Wed Jan 2 04:29:27 UTC 2019] Using 96 threads [Wed Jan 2 04:29:27 UTC 2019] Output prefix mr.41.15.15.0.02 [Wed Jan 2 04:29:27 UTC 2019] Using 25x of the longest ONT reads [Wed Jan 2 04:37:33 UTC 2019] Reducing super-read k-mer size [Wed Jan 2 06:26:55 UTC 2019] Mega-reads pass 1 [Wed Jan 2 06:26:55 UTC 2019] Running locally in 1 batch Processed 500000 super reads, irreducible 423797, processing 264 super reads per second Processed 1000000 super reads, irreducible 813442, processing 478 super reads per second Processed 1500000 super reads, irreducible 1174760, processing 531 super reads per second Processed 2000000 super reads, irreducible 1503316, processing 627 super reads per second Processed 2500000 super reads, irreducible 1801805, processing 632 super reads per second Processed 3000000 super reads, irreducible 2084976, processing 580 super reads per second Processed 3500000 super reads, irreducible 2356970, processing 618 super reads per second Processed 4000000 super reads, irreducible 2677953, processing 570 super reads per second [Fri Jan 4 13:43:38 UTC 2019] Mega-reads pass 2 [Fri Jan 4 13:43:38 UTC 2019] Running locally in 1 batch [Thu Jan 10 21:42:47 UTC 2019] Refining alignments [Thu Jan 10 22:13:57 UTC 2019] Joining [Thu Jan 10 23:37:37 UTC 2019] Gap consensus buffer overflow detected : /home/ubuntu/canu/src/MaSuRCA-3.3.0/bin/ufasta terminated /home/ubuntu/canu/src/MaSuRCA-3.3.0/bin/mega_reads_assemble_cluster.sh: line 659: 1657 Aborted (core dumped) $MYPATH/ufasta split -i refs.renamed.fa ${ref_names[@]} Error: Failed to open reference file 'to_join.3.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information Error: Failed to open reference file 'to_join.1.fa'

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/87, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHY3rEvHKuMYRGRG1LjiMEkCjeLOtks5vB9Q-gaJpZM4Z6jLa .

matthewmoscou commented 5 years ago

Aleksey,

You are right, there is something wrong with ufasta. When I independently compile it from https://github.com/gmarcais/ufasta, I get the same buffer error. Not sure what would be causing this, how stable is ufasta?

Matt

alekseyzimin commented 5 years ago

Hi,

I will meet with ufasta developer tomorrow and hopefully we will be able to resolve this problem. Can you send me some details about your version of linux kernel and g++ compiler?

Thanks, Aleksey

On Fri, Jan 11, 2019, 11:51 AM Matthew Moscou <notifications@github.com wrote:

Aleksey,

You are right, there is something wrong with ufasta. When I independently compile it from https://github.com/gmarcais/ufasta, I get the same buffer error. Not sure what would be causing this, how stable is ufasta?

Matt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/87#issuecomment-453582317, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHR6cIy18xLYfT5l1eTjL28wkScafks5vCMEQgaJpZM4Z6jLa .

matthewmoscou commented 5 years ago

Linux 4.15.0-1021-aws #21-Ubuntu SMP Tue Aug 28 10:23:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

alekseyzimin commented 5 years ago

Thank you. I think this could be related to new gcc. We tested masurca with g++ 4.9, but not hiher than that. I am reasonably confident we can fix this quickly.

On Fri, Jan 11, 2019, 1:43 PM Matthew Moscou <notifications@github.com wrote:

Linux 4.15.0-1021-aws #21 https://github.com/alekseyzimin/masurca/issues/21-Ubuntu SMP Tue Aug 28 10:23:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/87#issuecomment-453616679, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHYCtPh0sy34-X620ZnIEgV90BnJ6ks5vCNtkgaJpZM4Z6jLa .

gmarcais commented 5 years ago

@matthewmoscou Can you get us some extra information:

matthewmoscou commented 5 years ago

file_directory.txt refs.renamed.fa size is 370900644 Can you e-mail me (matthew.moscou@tsl.ac.uk) and then I can send a link to the file?

The issue though appears to be ufasta, as my attempts at using a small test FASTA file failed. This was using both the MaSuRCA compiled version and a manually compiled version of ufasta.

gmarcais commented 5 years ago

Can you try the latest version from my github tree.

This is frustrating, as I cannot reproduce your bug. I made a few modifications to avoid warnings on compilation and avoiding to close a file descriptor of -1. Now, it runs clean on valgrind (as it should):

> valgrind ../build-gcc/default/inst/bin/ufasta split -i refs.renamed.fa out1.fa out2.fa out3.fa out4.fa out5.fa out6.fa out7.fa out8.fa out9.fa out10.fa
==17060== Memcheck, a memory error detector
==17060== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17060== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==17060== Command: ../build-gcc/default/inst/bin/ufasta split -i refs.renamed.fa out1.fa out2.fa out3.fa out4.fa out5.fa out6.fa out7.fa out8.fa out9.fa out10.fa
==17060== 
==17060== 
==17060== HEAP SUMMARY:
==17060==     in use at exit: 0 bytes in 0 blocks
==17060==   total heap usage: 42,590 allocs, 42,590 frees, 986,072,125 bytes allocated
==17060== 
==17060== All heap blocks were freed -- no leaks are possible
==17060== 
==17060== For counts of detected and suppressed errors, rerun with: -v
==17060== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
valgrind ../build-gcc/default/inst/bin/ufasta split -i refs.renamed.fa out1.f  8.09s user 0.44s system 99% cpu 8.552 total

If you still get a crash, can you run it using valgrind as well?

matthewmoscou commented 5 years ago

This has been part resolved. For researchers using Amazon AWS, instead of using Ubuntu 18.04, use SUSE Linux Enterprise Server 11 SP4. This image worked without any issue for the problem raised above. @gmarcais will continue to work on solving this strange buffer overflow error in Ubuntu.

estolle commented 5 years ago

Hi there,

just FYI, I just tried the new fixed ufasta (thanks @gmarcais ) on a small fasta file (ufasta split -i input.fasta out1.fa out2.fa) and it runs now on my Ubuntu 16.04 without the buffer overflow errors. In both cases (earlier version with buffer overflow errors and new version without those errors) the output (plit) fasta files are produced and are identical.

Thanks for fixing!