alekseyzimin / masurca

GNU General Public License v3.0
242 stars 35 forks source link

MaSuRCA 3.2.8: sometimes problems with Illumina+ONT assembly #58

Open estolle opened 6 years ago

estolle commented 6 years ago

Hi there

I was running into some issues recently when I tried to assemble a small insect (250Mb) genome with Illumina (120x, PE 150bp) with ca 25x ONT 1D data. I am getting these buffer overflow issues and errors at the Refining alignments step, but the script proceeds into the assembly and eventually finishes. With another individuals Illumina data as input I get sometimes problems at the gapfilling step, yet the process finishes eventually. Any ideas what could cause this or how it might impact the final result?

[Sa 8. Sep 12:49:07 CEST 2018] Processing pe library reads [Sa 8. Sep 12:59:41 CEST 2018] Average PE read length 149 [Sa 8. Sep 12:59:41 CEST 2018] Using kmer size of 99 for the graph [Sa 8. Sep 12:59:42 CEST 2018] MIN_Q_CHAR: 33 WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1314280647, this automatic increase may be not enough! [Sa 8. Sep 12:59:42 CEST 2018] Creating mer database for Quorum [Sa 8. Sep 13:10:32 CEST 2018] Error correct PE [Sa 8. Sep 13:36:07 CEST 2018] Estimating genome size [Sa 8. Sep 13:41:08 CEST 2018] Estimated genome size: 273786976 [Sa 8. Sep 13:41:08 CEST 2018] Creating k-unitigs with k=99 [Sa 8. Sep 13:56:11 CEST 2018] Computing super reads from PE [Sa 8. Sep 14:36:19 CEST 2018] Using linking mates [Sa 8. Sep 14:36:19 CEST 2018] Using CABOG from /opt/MaSuRCA-3.2.8/bin/../CA8/Linux-amd64/bin [Sa 8. Sep 14:36:19 CEST 2018] Running mega-reads correction/assembly [Sa 8. Sep 14:36:19 CEST 2018] Using mer size 15 for mapping, B=15, d=0.02 [Sa 8. Sep 14:36:19 CEST 2018] Estimated Genome Size 273786976 [Sa 8. Sep 14:36:19 CEST 2018] Estimated Ploidy 1 [Sa 8. Sep 14:36:19 CEST 2018] Using 100 threads [Sa 8. Sep 14:36:19 CEST 2018] Output prefix mr.41.15.15.0.02 [Sa 8. Sep 14:36:19 CEST 2018] Using 25x of the longest ONT reads [Sa 8. Sep 14:40:22 CEST 2018] Reducing super-read k-mer size [Sa 8. Sep 14:47:19 CEST 2018] Mega-reads pass 1 [Sa 8. Sep 14:47:19 CEST 2018] Running locally in 1 batch [Sa 8. Sep 19:05:51 CEST 2018] Mega-reads pass 2 [Sa 8. Sep 19:05:51 CEST 2018] Running locally in 1 batch [Sa 8. Sep 20:02:50 CEST 2018] Refining alignments ERROR: failed to merge alignments at position 273 Please file a bug report ERROR: Could not parse delta file, /dev/stdin error no: 402 ERROR: Could not parse delta file, /dev/stdin error no: 402 [Sa 8. Sep 20:10:48 CEST 2018] Joining [Sa 8. Sep 20:16:07 CEST 2018] Gap consensus buffer overflow detected : ufasta terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f72f58d77e5] /lib/x86_64-linux-gnu/libc.so.6(fortify_fail+0x5c)[0x7f72f597915c] /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7f72f5977160] /lib/x86_64-linux-gnu/libc.so.6(+0x1190a7)[0x7f72f59790a7] ufasta[0x41bd6e] ufasta[0x403d96] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7f72f5880830] ufasta[0x404199] ======= Memory map: ======== 00400000-00429000 r-xp 00000000 08:12 3539149 /opt/MaSuRCA-3.2.8/bin/ufasta 00628000-00629000 r--p 00028000 08:12 3539149 /opt/MaSuRCA-3.2.8/bin/ufasta 00629000-0062b000 rw-p 00029000 08:12 3539149 /opt/MaSuRCA-3.2.8/bin/ufasta 01739000-01901000 rw-p 00000000 00:00 0 [heap] ...... [more entries here] /opt/MaSuRCA-3.2.8/bin/mega_reads_assemble_cluster.sh: line 620: 49949 Aborted (core dumped) ufasta split -i refs.renamed.fa ${ref_names[@]} xargs: invalid number "-I" for -P option

[more text of input parameters for the script here] xargs: ./do_consensus.sh: No such file or directory cat: 'merges.[0-9]*.txt': No such file or directory [Sa 8. Sep 20:16:12 CEST 2018] Warning! Some or all gap consensus jobs failed, see files in mr.41.15.15.0.02.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.15.0.02.1.fa and re-run assemble.sh [Sa 8. Sep 20:16:15 CEST 2018] Generating assembly input files [Sa 8. Sep 20:30:35 CEST 2018] Coverage threshold for splitting unitigs is 27 minimum ovl 98 [Sa 8. Sep 20:30:35 CEST 2018] Running assembly

estolle commented 5 years ago

I had run a few more assemblies (now with more ONT data) and I keep running into problems in later stages of the assembly: buffer overflow during the correct overlaps step (as of now 6 of such overflow error were detected)

[Do 13. Sep 22:57:55 CEST 2018] Processing pe library reads [Do 13. Sep 23:03:54 CEST 2018] Average PE read length 149 [Do 13. Sep 23:03:55 CEST 2018] Using kmer size of 99 for the graph [Do 13. Sep 23:03:55 CEST 2018] MIN_Q_CHAR: 33 WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 521214080, this automatic increase may be not enough! [Do 13. Sep 23:03:55 CEST 2018] Creating mer database for Quorum [Do 13. Sep 23:10:20 CEST 2018] Error correct PE [Do 13. Sep 23:25:45 CEST 2018] Estimating genome size [Do 13. Sep 23:28:03 CEST 2018] Estimated genome size: 267732689 [Do 13. Sep 23:28:03 CEST 2018] Creating k-unitigs with k=99 [Do 13. Sep 23:37:54 CEST 2018] Computing super reads from PE [Do 13. Sep 23:55:47 CEST 2018] Using CABOG from /opt/MaSuRCA-3.2.8/bin/../CA8/Linux-amd64/bin [Do 13. Sep 23:55:47 CEST 2018] Running mega-reads correction/assembly [Do 13. Sep 23:55:47 CEST 2018] Using mer size 15 for mapping, B=15, d=0.02 [Do 13. Sep 23:55:47 CEST 2018] Estimated Genome Size 267732689 [Do 13. Sep 23:55:47 CEST 2018] Estimated Ploidy 1 [Do 13. Sep 23:55:47 CEST 2018] Using 100 threads [Do 13. Sep 23:55:47 CEST 2018] Output prefix mr.41.15.15.0.02 [Do 13. Sep 23:55:47 CEST 2018] Using 25x of the longest ONT reads [Fr 14. Sep 00:04:55 CEST 2018] Reducing super-read k-mer size [Fr 14. Sep 00:13:23 CEST 2018] Mega-reads pass 1 [Fr 14. Sep 00:13:23 CEST 2018] Running locally in 1 batch Processed 500000 super reads, irreducible 299941, processing 4854 super reads per second [Sa 15. Sep 00:23:23 CEST 2018] Mega-reads pass 2 [Sa 15. Sep 00:23:23 CEST 2018] Running locally in 1 batch [Sa 15. Sep 09:51:35 CEST 2018] Refining alignments [Sa 15. Sep 10:03:55 CEST 2018] Joining [Sa 15. Sep 10:13:50 CEST 2018] Gap consensus [Sa 15. Sep 11:40:10 CEST 2018] Generating assembly input files [Sa 15. Sep 12:00:33 CEST 2018] Coverage threshold for splitting unitigs is 22 minimum ovl 250 [Sa 15. Sep 12:00:33 CEST 2018] Running assembly buffer overflow detected : /opt/MaSuRCA-3.2.8/CA8/Linux-amd64/bin/correct-olaps terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fef907657e5] /lib/x86_64-linux-gnu/libc.so.6(fortify_fail+0x5c)[0x7fef9080715c] /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7fef90805160] /lib/x86_64-linux-gnu/libc.so.6(+0x1164b2)[0x7fef908044b2] /opt/MaSuRCA-3.2.8/CA8/Linux-amd64/bin/correct-olaps[0x403b12] /opt/MaSuRCA-3.2.8/CA8/Linux-amd64/bin/correct-olaps[0x4026d4] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7fef9070e830] /opt/MaSuRCA-3.2.8/CA8/Linux-amd64/bin/correct-olaps[0x402c09] ======= Memory map: ======== 00400000-00424000 r-xp 00000000 08:12 3539523 /opt/MaSuRCA-3.2.8/CA8/Linux-amd64/bin/correct-olaps 00623000-00624000 r--p 00023000 08:12 3539523 /opt/MaSuRCA-3.2.8/CA8/Linux-amd64/bin/correct-olaps 00624000-00625000 rw-p 00024000 08:12 3539523 /opt/MaSuRCA-3.2.8/CA8/Linux-amd64/bin/correct-olaps 00625000-007b7000 rw-p 00000000 00:00 0 0159d000-02281000 rw-p 00000000 00:00 0 [heap] 7fef85c3c000-7fef906ee000 rw-p 00000000 00:00 0 7fef906ee000-7fef908ae000 r-xp 00000000 08:12 9443904 /lib/x86_64-linux-gnu/libc-2.23.so 7fef908ae000-7fef90aae000 ---p 001c0000 08:12 9443904 /lib/x86_64-linux-gnu/libc-2.23.so 7fef90aae000-7fef90ab2000 r--p 001c0000 08:12 9443904

estolle commented 5 years ago

tried different parameters and subsets of the input ONT data without success. I think the biggest issues is at the Gap consensus step with ufasta (see below). I am running out of ideas what to change/try. Any advice would be greatly appreciated!

/opt/MaSuRCA-3.2.8/bin/mega_reads_assemble_cluster.sh: line 620: 52887 Aborted (core dumped) ufasta split -i refs.renamed.fa ${ref_names[@]} xargs: invalid number "-I" for -P option Usage: xargs [OPTION]... COMMAND [INITIAL-ARGS]... Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.

[So 23. Sep 22:05:12 CEST 2018] Gap consensus buffer overflow detected : ufasta terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fbe70f1f7e5] /lib/x86_64-linux-gnu/libc.so.6(fortify_fail+0x5c)[0x7fbe70fc115c] /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7fbe70fbf160] /lib/x86_64-linux-gnu/libc.so.6(+0x1190a7)[0x7fbe70fc10a7] ufasta[0x41bd6e] ufasta[0x403d96] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7fbe70ec8830] ufasta[0x404199] ======= Memory map: ======== 00400000-00429000 r-xp 00000000 08:12 3539149 /opt/MaSuRCA-3.2.8/bin/ufasta 00628000-00629000 r--p 00028000 08:12 3539149 /opt/MaSuRCA-3.2.8/bin/ufasta 00629000-0062b000 rw-p 00029000 08:12 3539149 /opt/MaSuRCA-3.2.8/bin/ufasta

jnarayan81 commented 5 years ago

It seems MaSuRCA have some issues at consensus calling stage. I got the same error with Illumina PE+ONT reads.

➜  AssembleAvaga git:(master) ✗ ./assemble.sh 
[Mi Okt 24 18:27:42 CEST 2018] Processing pe library reads
[Mi Okt 24 18:37:15 CEST 2018] Average PE read length 250
[Mi Okt 24 18:37:15 CEST 2018] Using kmer size of 83 for the graph
[Mi Okt 24 18:37:16 CEST 2018] MIN_Q_CHAR: 33
[Mi Okt 24 18:37:16 CEST 2018] Creating mer database for Quorum
[Mi Okt 24 18:47:35 CEST 2018] Error correct PE.
[Mi Okt 24 20:16:27 CEST 2018] Estimating genome size.
[Mi Okt 24 20:28:00 CEST 2018] Estimated genome size: 191455832
[Mi Okt 24 20:28:00 CEST 2018] Creating k-unitigs with k=83
[Mi Okt 24 21:18:59 CEST 2018] Computing super reads from PE 
Using CABOG from is /home/urbe/Tools/masurca/MaSuRCA-3.2.8/bin/../CA8/Linux-amd64/bin
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=15, d=0.02
Estimated Genome Size 191455832
Estimated Ploidy 1
Using 16 threads
Output prefix mr.41.15.15.0.02
Using 30x of the longest ONT reads
Reducing super-read k-mer size
Mega-reads pass 1
Running locally in 1 batch
compute_psa 1041354 960227835
Mega-reads pass 2
Running locally in 1 batch
compute_psa 132642 3312138260
Refining alignments
Joining
Gap consensus
*** buffer overflow detected ***: ufasta terminated
/home/urbe/Tools/masurca/MaSuRCA-3.2.8/bin/mega_reads_assemble_cluster.sh: line 602:  5606 Aborted                 (core dumped) ufasta split -i refs.renamed.fa ${ref_names[@]}
xargs: invalid number "-I" for -P option
Try 'xargs --help' for more information.
xargs: do_consensus.sh: No such file or directory
cat: 'merges.[0-9]*.txt': No such file or directory
Warning! Some or all gap consensus jobs failed, see files in mr.41.15.15.0.02.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.15.0.02.1.fa and re-run assemble.sh
Generating assembly input files
Coverage threshold for splitting unitigs is 31 minimum ovl 250
Running assembly
AntoineHo commented 5 years ago

Hello, I have the same problem when running MaSuRCA with Illumina+ONT reads. Genome size should be 150Mb. Looking at CA.mr.41.15.15.0.02.log I find:

runCA failed
5 overlap correction jobs failed;

Here is the log file: CA.mr.41.15.15.0.02.log

AH

JFsanchezherrero commented 5 years ago

Dear @alekseyzimin,

We are also having the same problem when using MaSuRCA 3.2.8 for illumina PE + ONT data in the gap consensus step.

* xargs: invalid number "-I" for -P option Try 'xargs --help' for more information. xargs: ./do_consensus.sh: No existe el fichero o el directorio cat: 'merges.[0-9].txt': No existe el fichero o el directorio [dom nov 4 16:14:43 CET 2018] Gap consensus failed

Warning! Some or all gap consensus jobs failed, see files in mr.41.15.15.0.02.join_consensus.tmp **

We tested two different sets of data for different species and we got the same error log. Thank you,

JFsanchezherrero commented 5 years ago

Dear @alekseyzimin I have found that during the Refining of alignments, files $COORDS.matches*.all.txt.tmp are empty, only containing ">". That generates that the following steps result in errors leading to the final error in gap consensus step that we have reported here.

I have been checking the code and I would guess that the problem might be in line 504 from file mega_reads_assemble_cluster.sh

cat <(ufasta extract -f $COORDS.single.txt $COORDS.txt) <(ufasta extract -v -f $COORDS.single.txt $COORDS.mr.txt)| awk '{if($0~/^>/){pb=substr($1,2);print $0} else { print $3" "$4" "$5" "$6" "$10" "pb" "$11" "$9}}' | add_pb_seq.pl $LONGREADS1 | split_matches_file.pl $NUM_LONGREADS_READS_PER_BATCH .matches && ls .matches.* | xargs -P $NUM_THREADS -I % refine.sh $COORDS % $KMER && cat $COORDS.matches*.all.txt.tmp > $COORDS.all.txt && rm .matches.* && rm $COORDS.matches*.all.txt.tmp

I wonder if other users also experimenting this issue could tell us if they are seeing the same problem.

Thanks, P.D. We really appreciate your work on MaSurCa and we are all waiting to resolve this problem and obtain a draft assembly for our data.

alekseyzimin commented 5 years ago

Hi,

The most likely cause of this problem is incomplete/failed MaSuRCA installation. Please re-run install.sh, and capture the output and look for errors. I guess Mummer did not compile properly. Please let me know what you find out.

--Aleksey

On Mon, Nov 12, 2018 at 5:13 AM Jose Francisco Sanchez-Herrero < notifications@github.com> wrote:

Dear @alekseyzimin https://github.com/alekseyzimin I have found that during the Refining of alignments, files $COORDS.matches.all.txt.tmp* are empty, only containing ">". That generates that the following steps result in errors leading to the final error in gap consensus step that we have reported here.

I have been checking the code and I would guess that the problem might be in line 504 from file mega_reads_assemble_cluster.sh

cat <(ufasta extract -f $COORDS.single.txt $COORDS.txt) <(ufasta extract -v -f $COORDS.single.txt $COORDS.mr.txt)| awk '{if($0~/^>/){pb=substr($1,2);print $0} else { print $3" "$4" "$5" "$6" "$10" "pb" "$11" "$9}}' | add_pb_seq.pl $LONGREADS1 | split_matches_file.pl $NUM_LONGREADS_READS_PER_BATCH .matches && ls .matches. | xargs -P $NUM_THREADS -I % refine.sh $COORDS % $KMER && cat $COORDS.matches.all.txt.tmp > $COORDS.all.txt && rm .matches. && rm $COORDS.matches.all.txt.tmp

I wonder if other users also experimenting this issue could tell us if they are seeing the same problem.

Thanks, P.D. We really appreciate your work on MaSurCa and we are all waiting to resolve this problem and obtain a draft assembly for our data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-437826078, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHYFr3XxbAkjVUhBAr77tLWPdkyTdks5uuUnVgaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

AntoineHo commented 5 years ago

Hello, After a quick check I find in the logs: libtool: install: warning: 'libumdmummer' has not been installed in lib

alekseyzimin commented 5 years ago

Yep, that is the problem. Can you see why? Is there an error during compilation of libumdmummer?

On Thu, Nov 15, 2018 at 1:10 PM Antoine Houtain notifications@github.com wrote:

Hello, After a quick check I find in the logs: libtool: install: warning: 'libumdmummer' has not been installed in lib

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-439136978, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHdvniKm-nV25XD7kYUgSuPn4JubAks5uva4WgaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

AntoineHo commented 5 years ago

I checked the lib directory and find the libumdmummer.la file in it. In the bin directory mummer seems to be working. Is it possible that another mummer in $PATH causes troubles?

I rapidly checked the install.sh log but couldn't find anything here are the log files:

installsh.cout.log install.log

Thanks for your help!

alekseyzimin commented 5 years ago

Yes, another installation of mummer (especially from miniconda) will cause exactly that problem. Mummer is installed along with MaSuRCA. Can you remove mummer from the path and try to reinstall?

On Thu, Nov 15, 2018 at 2:52 PM Antoine Houtain notifications@github.com wrote:

I checked the lib directory and find the libumdmummer.la file in it. In the bin directory mummer seems to be working. Is it possible that another mummer in $PATH causes troubles?

I rapidly checked the install.sh log but couldn't find anything here are the log files:

installsh.cout.log https://github.com/alekseyzimin/masurca/files/2586790/installsh.cout.log install.log https://github.com/alekseyzimin/masurca/files/2586791/install.log

Thanks for your help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-439168685, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHXh5TjlBexS_OOKdrb_k5KHjV_iqks5uvcX-gaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

JFsanchezherrero commented 5 years ago

Hi there, I discarded from my path and environment all mummer libs I found and I downloaded and reinstalled MaSuRCA 3.2.8 but I still got the same problem as I stated before:

Dear @alekseyzimin I have found that during the Refining of alignments, files *$COORDS.matches.all.txt.tmp are empty, only containing ">". That generates that the following steps result in errors leading to the final error in gap consensus step that we have reported here. I have been checking the code and I would guess that the problem might be in line 504 from file mega_reads_assemble_cluster.sh* `cat <(ufasta extract -f $COORDS.single.txt $COORDS.txt) <(ufasta extract -v -f $COORDS.single.txt $COORDS.mr.txt)| awk '{if($0~/^>/){pb=substr($1,2);print $0} else { print $3" "$4" "$5" "$6" "$10" "pb" "$11" "$9}}' | add_pb_seq.pl $LONGREADS1 | split_matches_file.pl $NUM_LONGREADS_READS_PER_BATCH .matches && ls .matches. | xargs -P $NUM_THREADS -I % refine.sh $COORDS % $KMER && cat $COORDS.matches.all.txt.tmp > $COORDS.all.txt && rm .matches. && rm $COORDS.matches*.all.txt.tmp`

I wonder if other users have experienced the same after reinstalling or if you guys did solve the problem and got the assembler running succesfully.

alekseyzimin commented 5 years ago

No, I have not hear that this problem is common. Can you delete the MaSuRCA and re-untar/unzip and re-run install.sh on a clean configuration and see if there are any errors?

--Aleksey

On Mon, Nov 19, 2018 at 10:58 AM Jose Francisco Sanchez-Herrero < notifications@github.com> wrote:

Hi there, I discarded from my path and environment all mummer libs I found and I downloaded and reinstalled MaSuRCA 3.2.8 but I still got the same problem as I stated before:

Dear @alekseyzimin https://github.com/alekseyzimin I have found that during the Refining of alignments, files *$COORDS.matches.all.txt.tmp* are empty, only containing ">". That generates that the following steps result in errors leading to the final error in gap consensus step that we have reported here. I have been checking the code and I would guess that the problem might be in line 504 from file mega_reads_assemble_cluster.sh cat <(ufasta extract -f $COORDS.single.txt $COORDS.txt) <(ufasta extract -v -f $COORDS.single.txt $COORDS.mr.txt)| awk '{if($0~/^>/){pb=substr($1,2);print $0} else { print $3" "$4" "$5" "$6" "$10" "pb" "$11" "$9}}' | add_pb_seq.pl $LONGREADS1 | split_matches_file.pl $NUM_LONGREADS_READS_PER_BATCH .matches && ls .matches. | xargs -P $NUM_THREADS -I % refine.sh $COORDS % $KMER && cat $COORDS.matches.all.txt.tmp > $COORDS.all.txt && rm .matches. && rm $COORDS.matches*.all.txt.tmp

I wonder if other users have experienced the same after reinstalling or if you guys did solve the problem and got the assembler running succesfully.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-439943467, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHf-svX3sv8yLecKKTHKg0AdDbf1nks5uwtU4gaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

JFsanchezherrero commented 5 years ago

Yes I did. It was already done in a clean and new folder, downloading the tar file again and installing from scratch with no previous mummer dependencies.

JFsanchezherrero commented 5 years ago

The thing is that I got to finish the first test I did with illumina and Nanopore data a couple of months ago with no errors. So I dont know what it is going on exactly.

LironShv commented 5 years ago

Hello,

I am assembling a genome of approximately 150 Mbs using 25x Pacbio data and ~70x Illumina data. And I believe that I am running into the same problem as mentioned here before, I have been following this thread for a while. I have run the assembly a couple of times, and even after reinstalling masurca I get the following issues:

[Mon 19 Nov 12:15:37 GMT 2018] Processing pe library reads [Mon 19 Nov 12:23:24 GMT 2018] Average PE read length 150 [Mon 19 Nov 12:23:24 GMT 2018] Using kmer size of 99 for the graph [Mon 19 Nov 12:23:25 GMT 2018] MIN_Q_CHAR: 33 [Mon 19 Nov 12:23:25 GMT 2018] Creating mer database for Quorum [Mon 19 Nov 12:35:57 GMT 2018] Error correct PE [Mon 19 Nov 13:12:00 GMT 2018] Estimating genome size [Mon 19 Nov 13:17:56 GMT 2018] Estimated genome size: 160727761 [Mon 19 Nov 13:17:56 GMT 2018] Creating k-unitigs with k=99 [Mon 19 Nov 13:29:51 GMT 2018] Computing super reads from PE [Mon 19 Nov 14:37:22 GMT 2018] Using CABOG from /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/../CA8/Linux-amd64/bin [Mon 19 Nov 14:37:22 GMT 2018] Running mega-reads correction/assembly [Mon 19 Nov 14:37:22 GMT 2018] Using mer size 15 for mapping, B=17, d=0.029 [Mon 19 Nov 14:37:22 GMT 2018] Estimated Genome Size 160727761 [Mon 19 Nov 14:37:22 GMT 2018] Estimated Ploidy 1 [Mon 19 Nov 14:37:22 GMT 2018] Using 48 threads [Mon 19 Nov 14:37:23 GMT 2018] Output prefix mr.41.15.17.0.029 [Mon 19 Nov 14:37:23 GMT 2018] Pacbio coverage >25x, using 25x of the longest reads [Mon 19 Nov 14:38:18 GMT 2018] Reducing super-read k-mer size [Mon 19 Nov 14:44:41 GMT 2018] Mega-reads pass 1 [Mon 19 Nov 14:44:41 GMT 2018] Running locally in 1 batch [Tue 20 Nov 09:09:30 GMT 2018] Mega-reads pass 2 [Tue 20 Nov 09:09:30 GMT 2018] Running locally in 1 batch [Tue 20 Nov 11:08:27 GMT 2018] Refining alignments [Tue 20 Nov 11:17:44 GMT 2018] Joining [Tue 20 Nov 11:22:27 GMT 2018] Gap consensus [Tue 20 Nov 11:22:30 GMT 2018] Warning! Some or all gap consensus jobs failed, see files in mr.41.15.17.0.029.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.17.0.029.1.fa and re-run assemble.sh [Tue 20 Nov 11:22:49 GMT 2018] Generating assembly input files [Tue 20 Nov 11:54:47 GMT 2018] Coverage threshold for splitting unitigs is 35 minimum ovl 250 [Tue 20 Nov 11:54:47 GMT 2018] Running assembly

  Processed 500000 super reads, irreducible 399967, processing 1043 super reads per second buffer overflow detected : ufasta terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f253f2d37e5] /lib/x86_64-linux-gnu/libc.so.6(fortify_fail+0x5c)[0x7f253f37515c] /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7f253f373160] /lib/x86_64-linux-gnu/libc.so.6(+0x1190a7)[0x7f253f3750a7] ufasta[0x41bd6e] ufasta[0x403d96] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7f253f27c830] ufasta[0x404199] ======= Memory map: ======== 00400000-00429000 r-xp 00000000 00:2c 425594311 /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/ufasta 00628000-00629000 r--p 00028000 00:2c 425594311 /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/ufasta 00629000-0062b000 rw-p 00029000 00:2c 425594311 /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/ufasta 00bd6000-00c7f000 rw-p 00000000 00:00 0 [heap] 7f253f25c000-7f253f41c000 r-xp 00000000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so 7f253f41c000-7f253f61c000 ---p 001c0000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so 7f253f61c000-7f253f620000 r--p 001c0000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so 7f253f620000-7f253f622000 rw-p 001c4000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so ect. ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/mega_reads_assemble_cluster.sh: line 620: 33945 Aborted (core dumped) ufasta split -i refs.renamed.fa ${ref_names[@]} xargs: invalid number "-I" for -P option Usage: xargs [OPTION]... COMMAND [INITIAL-ARGS]... Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.

Thanks for helping and providing this assembler!

alekseyzimin commented 5 years ago

Hi, Likely not the same problem. Can you send me the output of "ls -lth" on your assembly folder to aleksey.zimin@gmail.com

On Tue, Nov 20, 2018 at 10:43 AM LironShv notifications@github.com wrote:

Hello,

I am assembling a genome of approximately 150 Mbs using 25x Pacbio data and ~70x Illumina data. And I believe that I am running into the same problem as mentioned here before, I have been following this thread for a while. I have run the assembly a couple of times, and even after reinstalling masurca I get the following issues:

[Mon 19 Nov 12:15:37 GMT 2018] Processing pe library reads [Mon 19 Nov 12:23:24 GMT 2018] Average PE read length 150 [Mon 19 Nov 12:23:24 GMT 2018] Using kmer size of 99 for the graph [Mon 19 Nov 12:23:25 GMT 2018] MIN_Q_CHAR: 33 [Mon 19 Nov 12:23:25 GMT 2018] Creating mer database for Quorum [Mon 19 Nov 12:35:57 GMT 2018] Error correct PE [Mon 19 Nov 13:12:00 GMT 2018] Estimating genome size [Mon 19 Nov 13:17:56 GMT 2018] Estimated genome size: 160727761 [Mon 19 Nov 13:17:56 GMT 2018] Creating k-unitigs with k=99 [Mon 19 Nov 13:29:51 GMT 2018] Computing super reads from PE [Mon 19 Nov 14:37:22 GMT 2018] Using CABOG from /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/../CA8/Linux-amd64/bin [Mon 19 Nov 14:37:22 GMT 2018] Running mega-reads correction/assembly [Mon 19 Nov 14:37:22 GMT 2018] Using mer size 15 for mapping, B=17, d=0.029 [Mon 19 Nov 14:37:22 GMT 2018] Estimated Genome Size 160727761 [Mon 19 Nov 14:37:22 GMT 2018] Estimated Ploidy 1 [Mon 19 Nov 14:37:22 GMT 2018] Using 48 threads [Mon 19 Nov 14:37:23 GMT 2018] Output prefix mr.41.15.17.0.029 [Mon 19 Nov 14:37:23 GMT 2018] Pacbio coverage >25x, using 25x of the longest reads [Mon 19 Nov 14:38:18 GMT 2018] Reducing super-read k-mer size [Mon 19 Nov 14:44:41 GMT 2018] Mega-reads pass 1 [Mon 19 Nov 14:44:41 GMT 2018] Running locally in 1 batch [Tue 20 Nov 09:09:30 GMT 2018] Mega-reads pass 2 [Tue 20 Nov 09:09:30 GMT 2018] Running locally in 1 batch [Tue 20 Nov 11:08:27 GMT 2018] Refining alignments [Tue 20 Nov 11:17:44 GMT 2018] Joining [Tue 20 Nov 11:22:27 GMT 2018] Gap consensus [Tue 20 Nov 11:22:30 GMT 2018] Warning! Some or all gap consensus jobs failed, see files in mr.41.15.17.0.029.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.17.0.029.1.fa and re-run assemble.sh [Tue 20 Nov 11:22:49 GMT 2018] Generating assembly input files [Tue 20 Nov 11:54:47 GMT 2018] Coverage threshold for splitting unitigs is 35 minimum ovl 250 [Tue 20 Nov 11:54:47 GMT 2018] Running assembly

Processed 500000 super reads, irreducible 399967, processing 1043 super reads per second buffer overflow detected : ufasta terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f253f2d37e5] /lib/x86_64-linux-gnu/libc.so.6(fortify_fail+0x5c)[0x7f253f37515c] /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7f253f373160] /lib/x86_64-linux-gnu/libc.so.6(+0x1190a7)[0x7f253f3750a7] ufasta[0x41bd6e] ufasta[0x403d96] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7f253f27c830] ufasta[0x404199] ======= Memory map: ======== 00400000-00429000 r-xp 00000000 00:2c 425594311 /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/ufasta 00628000-00629000 r--p 00028000 00:2c 425594311 /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/ufasta 00629000-0062b000 rw-p 00029000 00:2c 425594311 /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/ufasta 00bd6000-00c7f000 rw-p 00000000 00:00 0 [heap] 7f253f25c000-7f253f41c000 r-xp 00000000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so 7f253f41c000-7f253f61c000 ---p 001c0000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so 7f253f61c000-7f253f620000 r--p 001c0000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so 7f253f620000-7f253f622000 rw-p 001c4000 08:02 23075723 /lib/x86_64-linux-gnu/libc-2.23.so ect. ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] /home/ls001/genomes/software/MaSuRCA-3.2.8/bin/mega_reads_assemble_cluster.sh: line 620: 33945 Aborted (core dumped) ufasta split -i refs.renamed.fa ${ref_names[@]} xargs: invalid number "-I" for -P option Usage: xargs [OPTION]... COMMAND [INITIAL-ARGS]... Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.

Thanks for helping and providing this assembler!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-440319317, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHZvDVyHRpE-9hVHmsv29-FJ4bAbJks5uxCMigaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

JFsanchezherrero commented 5 years ago

Dear @alekseyzimin, I have seen there is a new version of MaSuRCA 3.2.9 where it is stated that it has been fixed the gap consensus step.

I have downloaded it and installed it from scratch (taking into account for mummer dependencies again). But I got the same error in the Refining of Alignments.

I have found that during the Refining of alignments, files _$COORDS.matches.all.txt.tmp_* are empty, only containing ">". That generates that the following steps result in errors leading to the final error in gap consensus step that we have reported here.

As stated before, I got thousands of error messages saying:

ERROR: Could not parse delta file, /dev/stdin error no: 402

I have noticed that now join_consensus folder contains some files, although most of them are empty.

ls mr.41.15.15.0.02.join_consensus.tmp/ -rw-r--r-- 1 jfsanchez 44 nov 21 13:19 blasr.err -rw-r--r-- 1 jfsanchez 0 nov 21 13:19 coords.1 -rw------- 1 jfsanchez 21487616 nov 21 13:19 core -rwxr-xr-x 1 jfsanchez 983 nov 21 13:19 do_consensus.sh* -rw-r--r-- 1 jfsanchez 0 nov 21 13:19 join_consensus.1.fasta -rw-r--r-- 1 jfsanchez 0 nov 21 13:19 merges.best.txt -rw-r--r-- 1 jfsanchez 82 nov 21 13:19 pbdagcon.err -rw-r--r-- 1 jfsanchez 0 nov 21 13:19 refs.txt

I have also attached a txt file with "ls -lth" of my assembly folder. Masurca_assembly_folder.txt

I am still thinking that the error is coming from Refining alignments step where all files _$COORDS.matches.all.txt.tmp_ contain only ">" and no sequence at all.

Thank you very much

estolle commented 5 years ago

I think my problem is also different. the mummer library was present in my previous install (which made problems). After installing the new 2.3.9 version, the was not ufasta related gap-consensus error anymore, so thats good. In the assembly step I still get Buffer overflow errors:

Running assembly buffer overflow detected : /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ff08df2a7e5] /lib/x86_64-linux-gnu/libc.so.6(fortify_fail+0x5c)[0x7ff08dfcc15c] /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7ff08dfca160] /lib/x86_64-linux-gnu/libc.so.6(+0x1164b2)[0x7ff08dfc94b2] /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps[0x403b12] /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps[0x4026d4] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7ff08ded3830] /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps[0x402c09] ======= Memory map: ======== 00400000-00424000 r-xp 00000000 08:12 3935483 /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps 00623000-00624000 r--p 00023000 08:12 3935483 /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps 00624000-00625000 rw-p 00024000 08:12 3935483 /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps

With the previous install I sometimes had error-free assembly runs with the same Illumina input data and settings but with a much smaller subset of ONT reads. Thatswhy I initially suspected the ONT input is the problem. I, however, can't any obvious problem with it (re-basecalling or check s for proper formatting etc did not give any problematic results).

sunnycqcn commented 5 years ago

I get the different error. The work stop at the 4-unitigger. Fragment correction job 1089 failed. Fragment correction job 1090 failed.

runCA failed.


Stack trace:

at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 1613. main::caFailure('1090 overlap jobs failed; remove /scratch/snyder/f/fu115/Geno...', undef) called at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 4372 main::overlapCorrection() called at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 6526


Failure message:

1090 overlap jobs failed; remove /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/3-overlapcorrection/frgcorr.sh to try again

----------------------------------------START Tue Nov 20 17:31:47 2018 /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart -O /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.ovlStore -G /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.gkpStore -T /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.tigStore -B 1702554 -eg 0.03 -Eg 1000 -em 0.03 -Em 1000 -repeatdetect 76 76 76 -el 63 -RS -o /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/genome

/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/unitigger.err 2>&1 sh: line 1: 27889 Segmentation fault /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart -O /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.ovlStore -G /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.gkpStore -T /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.tigStore -B 1702554 -eg 0.03 -Eg 1000 -em 0.03 -Em 1000 -repeatdetect 76 76 76 -el 63 -RS -o /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/genome

/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/unitigger.err 2>&1 ----------------------------------------END Tue Nov 20 18:10:42 2018 (2335 seconds) ERROR: Failed with signal SEGV (11)

runCA failed.


Stack trace:

at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 1613. main::caFailure('failed to unitig', '/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.m...') called at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 4795 main::unitigger() called at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 6528


Last few lines of the relevant log file (/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/unitigger.err):

WARNING: bogus overlap found for A=653754368 B=570347520 WARNING: A len=74 hang=14 ovl=-42671 WARNING: B len=72 hang=-42731 ovl=72 WARNING: bogus overlap found for A=1257537536 B=530341888 WARNING: A len=-389244560 hang=2713 ovl=-389275920 WARNING: B len=71 hang=-28647 ovl=71 WARNING: bogus overlap found for A=1257701376 B=526004224 WARNING: A len=-1933301984 hang=3180 ovl=-1933305164 WARNING: B len=71 hang=11298 ovl=-11227 WARNING: bogus overlap found for A=1257734144 B=526061568 WARNING: A len=-1393298272 hang=2744 ovl=-1393301016 WARNING: B len=71 hang=21029 ovl=-20958 WARNING: bogus overlap found for A=1257734144 B=526077952 WARNING: A len=-1393298272 hang=5968 ovl=-1393363583 WARNING: B len=71 hang=-59343 ovl=71 WARNING: bogus overlap found for A=1257799680 B=525946880 WARNING: A len=-296162976 hang=283 ovl=-296163259 WARNING: B len=71 hang=30471 ovl=-30400 WARNING: bogus overlap found for A=1257799680 B=526090240 WARNING: A len=-296162976 hang=291 ovl=-296163267 WARNING: B len=71 hang=46379 ovl=-46308 WARNING: bogus overlap found for A=1257799680 B=531431424 WARNING: A len=-296162976 hang=4821 ovl=-296230469 WARNING: B len=71 hang=-62672 ovl=71

Failed with 'Segmentation fault'

Backtrace (mangled):

/depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x477426] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2b58994d45e0] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x45d756] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x45e0f4] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x4031b7] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b589a1cac05] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x404724]

Backtrace (demangled):

[0] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x477426] [1] /usr/lib64/libpthread.so.0::(null) + 0xf5e0 [0x2b58994d45e0] [2] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x45d756] [3] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x45e0f4] [4] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x4031b7] [5] /usr/lib64/libc.so.6::(null) + 0xf5 [0x2b589a1cac05] [6] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x404724]

GDB:


Failure message:

failed to unitig

----------------------------------------START Tue Nov 20 18:10:43 2018 /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart -O /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.ovlStore -G /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.gkpStore -T /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.tigStore -B 1702554 -eg 0.03 -Eg 1000 -em 0.03 -Em 1000 -repeatdetect 76 76 76 -el 63 -RS -o /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/genome

/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/unitigger.err 2>&1 sh: line 1: 35180 Segmentation fault /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart -O /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.ovlStore -G /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.gkpStore -T /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/genome.tigStore -B 1702554 -eg 0.03 -Eg 1000 -em 0.03 -Em 1000 -repeatdetect 76 76 76 -el 63 -RS -o /scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/genome

/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/unitigger.err 2>&1 ----------------------------------------END Tue Nov 20 18:46:32 2018 (2149 seconds) ERROR: Failed with signal SEGV (11)

runCA failed.


Stack trace:

at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 1613. main::caFailure('failed to unitig', '/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.m...') called at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 4795 main::unitigger() called at /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin/runCA line 6528


Last few lines of the relevant log file (/scratch/snyder/f/fu115/Genome_assembly/masurca/Assembly/CA.mr.41.15.17.0.029/4-unitigger/unitigger.err):

WARNING: bogus overlap found for A=653754368 B=570347520 WARNING: A len=74 hang=14 ovl=-42671 WARNING: B len=72 hang=-42731 ovl=72 WARNING: bogus overlap found for A=1257635840 B=525258752 WARNING: A len=-1775698480 hang=4303 ovl=-1775707105 WARNING: B len=71 hang=-4322 ovl=71 WARNING: bogus overlap found for A=1257668608 B=526073856 WARNING: A len=-1238679648 hang=495 ovl=-1238680143 WARNING: B len=71 hang=0 ovl=71 WARNING: bogus overlap found for A=1257701376 B=526004224 WARNING: A len=-687012064 hang=3180 ovl=-687015244 WARNING: B len=71 hang=11298 ovl=-11227 WARNING: bogus overlap found for A=1257734144 B=526061568 WARNING: A len=-147008352 hang=2744 ovl=-147011096 WARNING: B len=71 hang=21029 ovl=-20958 WARNING: bogus overlap found for A=1257734144 B=526077952 WARNING: A len=-147008352 hang=5968 ovl=-147073663 WARNING: B len=71 hang=-59343 ovl=71 WARNING: bogus overlap found for A=1257799680 B=525946880 WARNING: A len=950126944 hang=283 ovl=950126661 WARNING: B len=71 hang=30471 ovl=-30400 WARNING: bogus overlap found for A=1257799680 B=526090240 WARNING: A len=950126944 hang=291 ovl=950126653 WARNING: B len=71 hang=46379 ovl=-46308

Failed with 'Segmentation fault'

Backtrace (mangled):

/depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x477426] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2afde39625e0] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x45d756] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x45e0f4] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x4031b7] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2afde4658c05] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart[0x404724]

Backtrace (demangled):

[0] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x477426] [1] /usr/lib64/libpthread.so.0::(null) + 0xf5e0 [0x2afde39625e0] [2] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x45d756] [3] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x45e0f4] [4] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x4031b7] [5] /usr/lib64/libc.so.6::(null) + 0xf5 [0x2afde4658c05] [6] /depot/bioinfo/apps/apps/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/bogart() [0x404724]

GDB:


Failure message:

failed to unitig The runSpec is batOptions=-repeatdetect 76 76 76 -el 63 -RS useGrid=0 cnsOnGrid=0 gridSubmitCommand=qsub cnsConcurrency=20 cnsMinFrags=10000 obtMerSize=22 ovlMerSize=22 unitigger=bogart merylMemory=65536 ovlStoreMemory=65536 utgGraphErrorLimit=1000 utgMergeErrorLimit=1000 utgGraphErrorRate=0.03 utgMergeErrorRate=0.03 ovlCorrBatchSize=100000 ovlCorrConcurrency=8 frgCorrThreads=20 frgCorrConcurrency=6 mbtThreads=20 ovlThreads=2 ovlHashBlockLength=10000000 ovlRefBlockSize=197194010 ovlConcurrency=20 doOverlapBasedTrimming=1 doUnitigSplitting=0 doChimeraDetection=normal merylThreads=20 stoneLevel=0 doExtendClearRanges=0 computeInsertSize=0 maxRepeatLength=12000 ovlErrorRate=0.1 cnsErrorRate=0.1 cgwErrorRate=0.1 cgwMergeMissingThreshold=-1 cgwMergeFilterLevel=1 cgwDemoteRBP=0 cnsReuseUnitigs=1 doFragmentCorrection=0

On Wed, Nov 21, 2018 at 6:37 AM stollec notifications@github.com wrote:

I think my problem is also different. the mummer library was present in my previous install (which made problems). After installing the new 2.3.9 version, the was not ufasta related gap-consensus error anymore, so thats good. In the assembly step I still get Buffer overflow errors:

Running assembly buffer overflow detected : /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ff08df2a7e5] /lib/x86_64-linux-gnu/libc.so.6(fortify_fail+0x5c)[0x7ff08dfcc15c] /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7ff08dfca160] /lib/x86_64-linux-gnu/libc.so.6(+0x1164b2)[0x7ff08dfc94b2] /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps[0x403b12] /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps[0x4026d4] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf0)[0x7ff08ded3830] /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps[0x402c09] ======= Memory map: ======== 00400000-00424000 r-xp 00000000 08:12 3935483 /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps 00623000-00624000 r--p 00023000 08:12 3935483 /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps 00624000-00625000 rw-p 00024000 08:12 3935483 /opt/MaSuRCA-3.2.9/CA8/Linux-amd64/bin/correct-olaps

With the previous install I sometimes had error-free assembly runs with the same Illumina input data and settings but with a much smaller subset of ONT reads. Thatswhy I initially suspected the ONT input is the problem. I, however, can't any obvious problem with it (re-basecalling or check s for proper formatting etc did not give any problematic results).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-440648185, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKA3Ec7oaW2TQ4LoYWy0dGnuGAVtXks5uxUkmgaJpZM4Wf_jR .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

AntoineHo commented 5 years ago

Hello, I get the same error after upgrading to Masurca 3.2.9

Overlap correction job 1 (/mnt/sda1/organism/masurca3.2.9/CA.mr.41.15.15.0.02/3-overlapcorrection/0001) failed. Overlap correction job 2 (/mnt/sda1/organism/masurca3.2.9/CA.mr.41.15.15.0.02/3-overlapcorrection/0002) failed.

runCA failed.

Stack trace:

at /home/lege/.conda/envs/masurca/bin/../CA8/Linux-amd64/bin/runCA line 1613. main::caFailure("2 overlap correction jobs failed; remove /mnt/sda1/organism/ma"..., undef) called at /home/lege/.conda/envs/masurca/bin/../CA8/Linux-amd64/bin/runCA line 4514 main::overlapCorrection() called at /home/lege/.conda/envs/masurca/bin/../CA8/Linux-amd64/bin/runCA line 6526

Failure message:

2 overlap correction jobs failed; remove /mnt/sda1/organism/masurca3.2.9/CA.mr.41.15.15.0.02/3-overlapcorrection/ovlcorr.sh (or run by hand) to try again

Further on in the log I also find:

/mnt/sda1/ricciae/masurca3.2.9/CA.mr.41.15.15.0.02/8-consensus/genome_007 failed -- no .success. /mnt/sda1/ricciae/masurca3.2.9/CA.mr.41.15.15.0.02/8-consensus/genome_008 failed -- no .success.

runCA failed.

Stack trace:

at /home/lege/.conda/envs/masurca/bin/../CA8/Linux-amd64/bin/runCA line 1613. main::caFailure("2 consensusAfterScaffolder jobs failed; remove /mnt/sda1/ricc"..., undef) called at /home/lege/.conda/envs/masurca/bin/../CA8/Linux-amd64/bin/runCA line 5756 main::postScaffolderConsensus() called at /home/lege/.conda/envs/masurca/bin/../CA8/Linux-amd64/bin/runCA line 6531

Failure message:

2 consensusAfterScaffolder jobs failed; remove /mnt/sda1/ricciae/masurca3.2.9/CA.mr.41.15.15.0.02/8-consensus/consensus.sh to try again

When I go to the 3-overlapcorrection folder and look in the 000 (int) .err files I find :

Starting Redo_Olaps () ... lines with ERROR ... ERROR: Bad alignment ends a_end = 0 b_end = 0 a_iid = 2823 b_iid = 245 errors = 0 Failed with 'Segmentation fault'

Backtrace (mangled): /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps[0x40db92] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12dd0)[0x7fe922d95dd0] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps[0x404ed1] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps[0x408f12] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps[0x404688] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fe9222bf09b] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps[0x402209]

Backtrace (demangled): [0] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps() [0x40db92] [1] /lib/x86_64-linux-gnu/libpthread.so.0::(null) + 0x12dd0 [0x7fe922d95dd0] [2] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps() [0x404ed1] [3] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps() [0x408f12] [4] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps() [0x404688] [5] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0xeb [0x7fe9222bf09b] [6] /home/lege/.conda/envs/masurca/CA8/Linux-amd64/bin/correct-olaps() [0x402209]

GDB: Segmentation fault (core dumped)

When I look into .err files in the 8-consensus folder, I cannot find anything causing an error, all files genome_00X.err seem fine saying everything is already computed and skipped.

Any idea? Thanks!

JFsanchezherrero commented 5 years ago

Dear @alekseyzimin,

After some debugging and installing problems we finally sorted it out and get to install Masurca with no dependencies to other library previously installed in the server.

The UFASTA error due to Mummer libraries problem was solved and Masurca successfully finished and generated a nice an long assembly! Thanks!

I have found that during the Refining of alignments, files _$COORDS.matches.all.txt.tmp_* are empty, only containing ">". That generates that the following steps result in errors leading to the final error in gap consensus step that we have reported here.

As stated before, I got thousands of error messages saying:

ERROR: Could not parse delta file, /dev/stdin error no: 402

I have noticed that now join_consensus folder contains some files, although most of them are empty.

I am still thinking that the error is coming from Refining alignments step where all files _$COORDS.matches.all.txt.tmp_ contain only ">" and no sequence at all.

Thank you very much

SvitlanaLukicheva commented 5 years ago

Hello, I have a similar problem with the gap consensus step with my assembly (80X Illumina PE 250 + 35X ONT). First I tried to assemble with v 3.2.8 and I got following errors:

xargs: invalid number for -P option
Usage: xargs [OPTION]... COMMAND INITIAL-ARGS...
Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.
...
xargs: ./do_consensus.sh: No such file or directory
cat: merges.[0-9]*.txt: No such file or directory

Warning! Some or all gap consensus jobs failed, see files in mr.41.15.15.0.02.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.15.0.02.1.fa and re-run assemble.sh

The folder mr.41.15.15.0.02.join_consensus.tmp contained two emtpy files:

-rw-r--r-- 1 slukiche fi010 0 Nov 24 11:20 merges.best.txt
-rw-r--r-- 1 slukiche fi010 0 Nov 24 11:20 refs.txt

I then tried with v 3.2.9 (from bioconda) and got following errors:

Error: Failed to open reference file 'to_join.1.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
xargs: ./do_consensus.sh: exited with status 255; aborting
Error: Failed to open reference file 'to_join.1.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
xargs: ./do_consensus.sh: exited with status 255; aborting
cat: merges.[0-9]*.txt: No such file or directory

Warning! Some or all gap consensus jobs failed, see files in mr.41.15.15.0.02.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.15.0.02.1.fa and re-run assemble.sh

Now the folder mr.41.15.15.0.02.join_consensus.tmp contains more files, but I couldn't find any error there:

-rw-r--r-- 1 slukiche fi010       44 Dec  3 03:14 blasr.err
-rw-r--r-- 1 slukiche fi010        0 Dec  3 03:14 coords.1
-rw------- 1 slukiche fi010 21594112 Dec  3 03:14 core.39345
-rw------- 1 slukiche fi010 21729280 Dec  3 03:14 core.39365
-rwxr-xr-x 1 slukiche fi010      905 Dec  3 03:14 do_consensus.sh
-rw-r--r-- 1 slukiche fi010        0 Dec  3 03:14 join_consensus.1.fasta
-rw-r--r-- 1 slukiche fi010        0 Dec  3 03:14 merges.best.txt
-rw-r--r-- 1 slukiche fi010       82 Dec  3 03:14 pbdagcon.err
-rw-r--r-- 1 slukiche fi010        0 Dec  3 03:14 refs.txt

The assembly is still running, I don't know whether it will succeed and if so, whether the result will be influenced by these errors.

alekseyzimin commented 5 years ago

Excellent! Can you tell me what was the problem and how you've addressed it?

On Mon, Dec 3, 2018 at 4:18 AM Jose Francisco Sanchez-Herrero < notifications@github.com> wrote:

Dear @alekseyzimin https://github.com/alekseyzimin,

After some debugging and installing problems we finally sorted it out and get to install Masurca with no dependencies to other library previously installed in the server.

The UFASTA error due to Mummer libraries problem was solved and Masurca successfully finished and generated a nice an long assembly! Thanks!

I have found that during the Refining of alignments, files *$COORDS.matches.all.txt.tmp** are empty, only containing ">". That generates that the following steps result in errors leading to the final error in gap consensus step that we have reported here.

As stated before, I got thousands of error messages saying:

ERROR: Could not parse delta file, /dev/stdin error no: 402

I have noticed that now join_consensus folder contains some files, although most of them are empty.

I am still thinking that the error is coming from Refining alignments step where all files $COORDS.matches.all.txt.tmp contain only ">" and no sequence at all.

Thank you very much

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-443640755, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHbxtg5Y3Sosik-rVuspa3AEU08z_ks5u1OxRgaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

JFsanchezherrero commented 5 years ago

Was it for me this comment?

Excellent! Can you tell me what was the problem and how you've addressed it?

I am afraid the problems was derived from UFASTA during the refining alingments although the error was coming later in the consensus step.

What I had to do in order to avoid problems with MUmmer or any other libraries and successfully install masurca was:

  1. I download from the github web, release 3.2.9 version
  2. tar -zxvf into a folder
  3. Modify install.sh. I added at the beginning of the lines to remove PATH variable from my environment and only use basic variables. To do so i added:

export PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/openmpi-184/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin

  1. Because of the specific configuration of my server, I had to connect to node within my server where I will be sending the command and install it (sh install.sh).

  2. Generate the assemble.sh using the configuration file.

  3. Exit to the master node and send via grid engine queues the call for masurca to the node where I had installed it.

Using this approach I avoided installing masurca in the whole server but only in the node where I was going to do the job but also I avoided mis-configuration of third party libraries such as Mummer discarding environment variables before installation.

Thanks!

alekseyzimin commented 5 years ago

Thank you for letting me know! Having clean environment is essential for MaSuRCA installation.

All the Best, Aleksey

On Tue, Dec 4, 2018 at 10:40 AM Jose Francisco Sanchez-Herrero < notifications@github.com> wrote:

Was it for me this comment?

Excellent! Can you tell me what was the problem and how you've addressed it?

I am afraid the problems was derived from UFASTA during the refining alingments although the error was coming later in the consensus step.

What I had to do in order to avoid problems with MUmmer or any other libraries and successfully install masurca was:

  1. I download from the github web, release 3.2.9 version
  2. tar -zxvf into a folder
  3. Modify install.sh. I added at the beginning of the lines to remove PATH variable from my environment and only use basic variables. To do so i added:

export PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/openmpi-184/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin

1.

Because of the specific configuration of my server, I had to connect to node within my server where I will be sending the command and install it (sh install.sh). 2.

Generate the assemble.sh using the configuration file. 3.

Exit to the master node and send via grid engine queues the call for masurca to the node where I had installed it.

Using this approach I avoided installing masurca in the whole server but only in the node where I was going to do the job but also I avoided mis-configuration of third party libraries such as Mummer discarding environment variables before installation.

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-444145125, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHZiwSILyEWlDGs8p74Xnx7BuPxFYks5u1pdigaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

AntoineHo commented 5 years ago

Hello, I installed with the added line export PATH=... I could not find errors in the log of installation so I went on. Then during the assembly I have the following issue:

[Fri Dec  7 00:06:00 CET 2018] Processing pe library reads
[Fri Dec  7 01:17:57 CET 2018] Average PE read length 250
[Fri Dec  7 01:17:57 CET 2018] Using kmer size of 127 for the graph
[Fri Dec  7 01:17:57 CET 2018] MIN_Q_CHAR: 33
[Fri Dec  7 01:17:57 CET 2018] Creating mer database for Quorum
[Fri Dec  7 01:37:04 CET 2018] Error correct PE
[Fri Dec  7 05:07:24 CET 2018] Estimating genome size
[Fri Dec  7 05:23:34 CET 2018] Estimated genome size: 187883011
[Fri Dec  7 05:23:34 CET 2018] Creating k-unitigs with k=127
[Fri Dec  7 06:44:43 CET 2018] Computing super reads from PE 
[Fri Dec  7 08:40:48 CET 2018] Using CABOG from /home/lege/tools/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin
[Fri Dec  7 08:40:48 CET 2018] Running mega-reads correction/assembly
[Fri Dec  7 08:40:48 CET 2018] Using mer size 15 for mapping, B=15, d=0.02
[Fri Dec  7 08:40:48 CET 2018] Estimated Genome Size 187883011
[Fri Dec  7 08:40:48 CET 2018] Estimated Ploidy 1
[Fri Dec  7 08:40:48 CET 2018] Using 44 threads
[Fri Dec  7 08:40:48 CET 2018] Output prefix mr.41.15.15.0.02
[Fri Dec  7 08:40:48 CET 2018] Using 25x of the longest ONT reads
[Fri Dec  7 08:51:43 CET 2018] Reducing super-read k-mer size
[Fri Dec  7 09:05:35 CET 2018] Mega-reads pass 1
[Fri Dec  7 09:05:35 CET 2018] Running locally in 1 batch
[Fri Dec  7 11:31:42 CET 2018] Mega-reads pass 2
[Fri Dec  7 11:31:42 CET 2018] Running locally in 1 batch
[Fri Dec  7 14:31:26 CET 2018] Refining alignments
[Fri Dec  7 14:44:17 CET 2018] Joining
[Fri Dec  7 14:53:43 CET 2018] Gap consensus
*** buffer overflow detected ***: /home/lege/tools/MaSuRCA-3.2.9/bin/ufasta terminated
/home/lege/tools/MaSuRCA-3.2.9/bin/mega_reads_assemble_cluster.sh: line 624:  1742 Aborted                 (core dumped) $MYPATH/ufasta split -i refs.renamed.fa ${ref_names[@]}
Error: Failed to open reference file 'to_join.1.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
Error: Failed to open reference file 'to_join.2.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
Error: Failed to open reference file 'to_join.3.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
Error: Failed to open reference file 'to_join.5.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
Error: Failed to open reference file 'to_join.4.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
ERROR: Could not parse delta file, /dev/stdin
error no: 402
Error: Failed to open reference file 'to_join.6.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
xargs: ./do_consensus.sh: exited with status 255; aborting
xargs: ./do_consensus.sh: exited with status 255; aborting
Error: Failed to open reference file 'to_join.1.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
Error: Failed to open reference file 'to_join.2.fa'
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, /dev/stdin
error no: 402
xargs: ./do_consensus.sh: exited with status 255; aborting
xargs: ./do_consensus.sh: exited with status 255; aborting
cat: 'merges.[0-9]*.txt': No such file or directory
[Fri Dec  7 14:53:49 CET 2018] Warning! Some or all gap consensus jobs failed, see files in mr.41.15.15.0.02.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.15.0.02.1.fa and re-run assemble.sh
[Fri Dec  7 14:53:55 CET 2018] Generating assembly input files
[Fri Dec  7 15:15:02 CET 2018] Coverage threshold for splitting unitigs is 20 minimum ovl 250
[Fri Dec  7 15:15:02 CET 2018] Running assembly

I think the error comes from something around line 582 of mega_reads_assemble_cluster.sh. Since MaSuRCA cannot find to_join.fa because of a buffer overflow in ufasta split Then it calls nucmer that seem to be well compiled Any ideas?

Thanks! AH

alekseyzimin commented 5 years ago

Hi,

Can you test ufasta split separately? That is run

ufasta split -i file.fasta file1 file2 file3 file4 -- this should split the fasta file into 4 files I have not yet seen buffer overflow error in ufasta split. But I can change the code if it causes problems.

--Aleksey

On Fri, Dec 7, 2018 at 12:14 PM Antoine Houtain notifications@github.com wrote:

Hello, I installed with the added line export PATH=... I could not find errors in the log of installation so I went on. Then during the assembly I have the following issue:

[Fri Dec 7 00:06:00 CET 2018] Processing pe library reads [Fri Dec 7 01:17:57 CET 2018] Average PE read length 250 [Fri Dec 7 01:17:57 CET 2018] Using kmer size of 127 for the graph [Fri Dec 7 01:17:57 CET 2018] MIN_Q_CHAR: 33 [Fri Dec 7 01:17:57 CET 2018] Creating mer database for Quorum [Fri Dec 7 01:37:04 CET 2018] Error correct PE [Fri Dec 7 05:07:24 CET 2018] Estimating genome size [Fri Dec 7 05:23:34 CET 2018] Estimated genome size: 187883011 [Fri Dec 7 05:23:34 CET 2018] Creating k-unitigs with k=127 [Fri Dec 7 06:44:43 CET 2018] Computing super reads from PE [Fri Dec 7 08:40:48 CET 2018] Using CABOG from /home/lege/tools/MaSuRCA-3.2.9/bin/../CA8/Linux-amd64/bin [Fri Dec 7 08:40:48 CET 2018] Running mega-reads correction/assembly [Fri Dec 7 08:40:48 CET 2018] Using mer size 15 for mapping, B=15, d=0.02 [Fri Dec 7 08:40:48 CET 2018] Estimated Genome Size 187883011 [Fri Dec 7 08:40:48 CET 2018] Estimated Ploidy 1 [Fri Dec 7 08:40:48 CET 2018] Using 44 threads [Fri Dec 7 08:40:48 CET 2018] Output prefix mr.41.15.15.0.02 [Fri Dec 7 08:40:48 CET 2018] Using 25x of the longest ONT reads [Fri Dec 7 08:51:43 CET 2018] Reducing super-read k-mer size [Fri Dec 7 09:05:35 CET 2018] Mega-reads pass 1 [Fri Dec 7 09:05:35 CET 2018] Running locally in 1 batch [Fri Dec 7 11:31:42 CET 2018] Mega-reads pass 2 [Fri Dec 7 11:31:42 CET 2018] Running locally in 1 batch [Fri Dec 7 14:31:26 CET 2018] Refining alignments [Fri Dec 7 14:44:17 CET 2018] Joining [Fri Dec 7 14:53:43 CET 2018] Gap consensus buffer overflow detected : /home/lege/tools/MaSuRCA-3.2.9/bin/ufasta terminated /home/lege/tools/MaSuRCA-3.2.9/bin/mega_reads_assemble_cluster.sh: line 624: 1742 Aborted (core dumped) $MYPATH/ufasta split -i refs.renamed.fa ${ref_names[@]} Error: Failed to open reference file 'to_join.1.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information ERROR: Could not parse delta file, /dev/stdin error no: 402 Error: Failed to open reference file 'to_join.2.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information ERROR: Could not parse delta file, /dev/stdin error no: 402 Error: Failed to open reference file 'to_join.3.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information ERROR: Could not parse delta file, /dev/stdin error no: 402 Error: Failed to open reference file 'to_join.5.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information Error: Failed to open reference file 'to_join.4.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information ERROR: Could not parse delta file, /dev/stdin error no: 402 ERROR: Could not parse delta file, /dev/stdin error no: 402 Error: Failed to open reference file 'to_join.6.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information ERROR: Could not parse delta file, /dev/stdin error no: 402 xargs: ./do_consensus.sh: exited with status 255; aborting xargs: ./do_consensus.sh: exited with status 255; aborting Error: Failed to open reference file 'to_join.1.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information ERROR: Could not parse delta file, /dev/stdin error no: 402 Error: Failed to open reference file 'to_join.2.fa' Usage: nucmer [options] ref:path qry:path+ Use --help for more information ERROR: Could not parse delta file, /dev/stdin error no: 402 xargs: ./do_consensus.sh: exited with status 255; aborting xargs: ./do_consensus.sh: exited with status 255; aborting cat: 'merges.[0-9]*.txt': No such file or directory [Fri Dec 7 14:53:49 CET 2018] Warning! Some or all gap consensus jobs failed, see files in mr.41.15.15.0.02.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.15.0.02.1.fa and re-run assemble.sh [Fri Dec 7 14:53:55 CET 2018] Generating assembly input files [Fri Dec 7 15:15:02 CET 2018] Coverage threshold for splitting unitigs is 20 minimum ovl 250 [Fri Dec 7 15:15:02 CET 2018] Running assembly

I think the error comes from something around line 582 of mega_reads_assemble_cluster.sh. Since MaSuRCA cannot find to_join.fa because of a buffer overflow in ufasta split Then it calls nucmer that seem to be well compiled Any ideas?

Thanks! AH

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-445301097, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHXcTY642CFiUeoAjqJBLnVtVCDleks5u2qH2gaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

AntoineHo commented 5 years ago

I have tried to do it, it did not fail so I am searching now what caused the buffer overflow error. Join consensus files in the directory are empty and coords.1, coords.2 etc are also empty. However, ref.1.fa ref.2.fa, etc are not.

alekseyzimin commented 5 years ago

you have to execute that command inside the mr......join_consensus.tmp folder.

On Fri, Dec 7, 2018 at 5:31 PM Antoine Houtain notifications@github.com wrote:

I found that when I do: perl -ane '{if($F[0] =~ /^>/){$rn=$F[0];}else{$seq=$F[0]; $seq=~ tr/a-zA-Z//s; print "$rn\n$F[0]\n" if(length($seq)>length($F[0])*0.1);}}' ../mr.41.15.0.0.02.1.to_join.fa.tmp | ~/tools/MaSuRCA-3.2.9/bin/ split_reads_to_join.pl qrys.txt to_join (list of ref.fa files)

I get: Can't open ../mr.41.15.0.0.02.1.to_join.fa.tmp: No such file or directory. I find a mr.41.15.0.0.02.1.to_join.fa.tmp file of 4Gb in the directory

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/58#issuecomment-445386067, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHZ_yQQ4sXlsDpgLM9WG86eLbev8lks5u2uxSgaJpZM4Wf_jR .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com