dhlbh / BASE

a fast and accurate de novo genome assembler for longer NGS reads
GNU General Public License v2.0
2 stars 0 forks source link

contig.fa file is empty! #1

Open faguil opened 7 years ago

faguil commented 7 years ago

Hi,

I have been tried to use BASE assembler to get contigs from some 251bp HiSeq Illumina data, but after preparing the 2BWT files using 2bwt_builder_cpu and AsiicBWT2BWT, making a 2bwt_index file according to the instructions of the following link: https://sourceforge.net/p/baseassembler/discussion/general/thread/3ef793cf/

Finally, I ran base using the following command:

/usr/local/src/BASE-master/src/base -l Mstichopi-BASE2.txt -o Mstichopi-BASE_testE30 -E 30 -P 16

BASE ran without any warnings or erros, however, the final contig.fa output is EMPTY!!!

Version 1.00: Released at 28/1/2016 and Compilied at Jul 1 2017 11:56:57

Main arguments: Expect depth: 30 Low depth: 3 Infer threshold: 1.20 Solve branches: 0 Solve heterozygosis: 0

[Main] invoked at Sat Jul 1 15:48:55 2017

Loading index ... lib number is: 1 Loading index: /sysdev/s9/felipe/Mstichopi_genome/Mstichopi-BASE... Read length 251 Loading 1 libraries. Loading index DONE at Sat Jul 1 17:21:57 2017

Assembly contig with library number 1 and generating contigs with Output prefix Mstichopi-BASE_testE30 There are totally 701296508 read number for contig assembly Checking window for termination: 4675310 and perthread 292206 and buffer size 4870 After initial, check mem: contigs.c: time = 0.639902 (usr) + 3956.190567 (sys) = 3956.830469 (sec) maxrss = 210214408 Initial time: 8.360000 seconds 16 thread(s) initialized. Thread6 buffer full Thread5 buffer full Thread14 buffer full Thread9 buffer full Thread4 buffer full Thread10 buffer full Thread12 buffer full Thread7 buffer full Thread1 buffer full Thread0 buffer full Thread2 buffer full Thread3 buffer full Thread15 buffer full Thread13 buffer full Thread11 buffer full Thread8 buffer full Stat: 4675312 0 0 3116880 0 0 1.000000 0.000000 0.000000 Extension finished, ReadUsedRate 0.000000, TotalLength 0, CPU time 484.390000, Realtime 34.90 s layout unique contig number: 0 Layout all contigs finished Assembly DONE at Sat Jul 1 17:22:41 2017

Free index ... ALL DONE. THANK YOU!

As it is not a manual to run this assembler, I followed the instruction in the INSTALL file coupled with the software. However, I am no sure if what I have done is ok and I don't know why my contig file output is EMPTY.

Please, can you give me some standard commands to run this assembler in order to compare them with what I have done and also give me some clues about why my output is empty. I really want to use this assembler, but I wasn't able to run it properly.

Thanks

dhlbh commented 7 years ago

Hi Faguil, Can you post the config file for 2bwt_builder_cpu, and the result of the sizes of "/sysdev/s9/felipe/Mstichopi_genome/Mstichopi-BASE*" files.

I guess your 2bwt preparation was well finished, but seeds for contig extension were failed to generated. From the parameters you've set, the expected genome size is near 6G?

Best, bh

faguil commented 7 years ago

Hi bh,

Thanks for your email. This is the config file that I used for running 2bwt_builder_cpu

max_rd_len=251 [LIB] avg_ins=546 q1=/path/to/files/Mstichopi-third_1.fq q2=/path/to/files/Mstichopi-third_2.fq q1=/path/to/files/Mstichopi-fourth_1.fq q1=/path/to/files/Mstichopi-fourth_2.fq qual_cutoff=74

The file sizes are:

Mstichopi-BASE.f.bwt.ascii 165G Mstichopi-BASE.r.bwt.ascii 165G Mstichopi-BASE.qual.bit 21G Mstichopi-BASE.ridt 2.7G Mstichopi-BASE.bwt 83G Mstichopi-BASE.fmv 11G Mstichopi-BASE.rev.bwt 83G Mstichopi-BASE.rev.fmv 11G

Considering I am using HiSeq phred33 quality reads, I am running again the 2bwt_builder_cpu because I am not sure about the qual_cutoff parameter is correct. I have changed qual_cutoff parameter to 40

The genome size of my species is ~1.2Gb (calculated by flow cytometric and 980Mb calculated by k-mer analysis, so I am expecting to obtain an assembly size between this range), I have ran BASE using different values of E, the command I showed before was just one of them.

Cheers,

dhlbh commented 7 years ago

Ok, I'm waiting for your result of qual_cutoff=40 and E=total_base_num/1.2G

faguil commented 7 years ago

Hi bh,

I am now facing another problem. After re-doing all files using a qual_cutoff=40, I am getting a Segmentation fault error. I tried different E values from 180, 150, 120, 90, 60 and 30, and I always got this segmentation fault. The best E value should be 180 according to this formula E=total_base_num/1.2G

The machine that I am using has 1TB of RAM, so I don't think is a memory issue with the server. Also, I tried to run BASE with sudo and root privileges and I still got the same error

Here is one of the command that I have tried:

/usr/local/src/BASE-master/src/base -l Mstichopi_BASE.txt -o testE180 -E 180 -P 16

Version 1.00: Released at 28/1/2016 and Compilied at Jul 1 2017 11:56:57

Main arguments: Expect depth: 180 Low depth: 3 Infer threshold: 1.20 Solve branches: 0 Solve heterozygosis: 0

[Main] invoked at Wed Jul 5 15:36:43 2017

Loading index ... lib number is: 1 Loading index: /sysdev/s9/felipe/Mstichopi_genome/Mstichopi-BASE... Read length 251 Loading 1 libraries. Loading index DONE at Wed Jul 5 15:39:18 2017

Assembly contig with library number 1 and generating contigs with Output prefix testE180 There are totally 701296508 read number for contig assembly Checking window for termination: 779218 and perthread 48701 and buffer size 811 After initial, check mem: contigs.c: time = 0.347947 (usr) + 156.307237 (sys) = 156.655184 (sec) maxrss = 198668356 Initial time: 1.120000 seconds 16 thread(s) initialized. T1: , 0; new seq TTTTTTTTTTTTTTTTTTTTTT; iter seq TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTG; newlen 22, seedlen 218, end 239; iter 2 T1: , 0; new seq TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT; iter seq TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTG; newlen 47, seedlen 193, end 239; iter 5 T2: , 0 207; iter 3 T1: , 0; new seq TTTTTTTTTTTTTTTTTTTTTT; iter seq TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTG; newlen 22, seedlen 218, end 239; iter 2 T1: , 0; new seq TTTTTTTTTTTTTTTTTTTTTT; iter seq TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTG; newlen 22, seedlen 218, end 239; iter 2 T2: , 0 239; iter 3 T1: , 0; new seq C; iter seq CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCCA; newlen 1, seedlen 223, end 222; iter 3 T1: , 0; new seq TTTTTTTTTTTTTTTTTTTTTT; iter seq TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTG; newlen 22, seedlen 218, end 239; iter 2 T2: , 0 159; iter 1 T2: , 0 179; iter 1 T2: , 0 239; iter 3 T2: , 0 128; iter 4 T1: , 0; new seq GGGGGGGGGGGGGGGGGGGGGGGGGGGGG; iter seq GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGGGGGGT; newlen 29, seedlen 211, end 239; iter 3 T2: , 0 18; iter 8 T2: , 0 223; iter 8 T2: , 0 18; iter 13 T1: , 0; new seq AT; iter seq CTTATAAACAGGCAGCTGAGGAGCACGTTTATCATCTGTTTCCCTTGCATCAGTGTTAGATTCATTACCTCTGATAACCGCTTGGCGGTTTCTCAGAGGGAACATCAGACTGGGGTAGTCTTTTGGTCCTTCCAGCCATCATAATATAATATCAAACTCGTCAATATCAGCAATCGTCAATCAAACTGTATAGTTTAAGCTATATAGATTAAAACAAATTTTAAAATAAGTTGAAATA; newlen 2, seedlen 238, end 238; iter 53 T2: , 0 34; iter 7 T2: , 0 186; iter 5 T1: , 0; new seq AAATGTGACTATTGCGATTA; iter seq TAGAGCCTCTCAAAAAGGTGCGTTACTATCACATGTAAGAACACATACAGGAGAAAAACCATTTAAATGTGACATTTGTGATTTTAGGACAGGTCTAAAGAGGACTTTGTTAATACATGTAAGAACACACACTGGAGAAAAACCATTTAAATGTGACATTTGTGATTATAGAGCCTCTCGAAAAGATTATTTACTATCACATGTAAGAACACACACTGGA; newlen 20, seedlen 220, end 193; iter 6 T2: , 0 38; iter 9 T2: , 0 19; iter 1 T2: , 0 175; iter 7 T1: , 0; new seq A; iter seq TGGCCTATCGATCCTTTTGACTTTTGGGAGTTTCAAGCAAGAGGTGTCAGAAAAGTTACCACAGGGATAACTGGCTTGTGGCGGCCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGTTGGATTGTTCACCCACTAATAGGGAATGTGAGCTGGGTTTAG; newlen 1, seedlen 207, end -1; iter 20 T2: , 0 30; iter 2 T2: , 0 18; iter 1 T1: , 0; new seq A; iter seq ACAGTGTCCTATATGGTGAAGCTAATCCTCTAGAATGAACCCAGTGTACCCCGTGTATGTTTGCGGTTGCTTCGTTGGTGGAGTTAATTCCATAGGGTTTCAATGTCTGTAGGGTGCACTGATAGAAGCCTTCCCTTTGTCAAAGTTTAATTTTGTTTTCTTTACCTCCAGCTGCTTGTATACTCTGAGTTGGTGTTATTTAGGAATCAGCCCAGGTGTGCTC; newlen 1, seedlen 223, end -1; iter 29 T2: , 0 18; iter 11 T2: , 0 110; iter 5 T2: , 0 45; iter 18 T2: , 0 79; iter 3 T2: , 0 18; iter 0 T2: , 0 79; iter 25 T1: , 0; new seq AT; iter seq ATATATATATATATATATATATATATATATATATATATAATATTGA; newlen 2, seedlen 46, end 46; iter 22 T1: , 0; new seq CT; iter seq GGAAAGGTTGAAGCTTGTGTCACGGCACCCATAGTGAGAGGTCTCAAAAGGTTTTTAGGCTTGAGATCTGCACGAATGGAGTTAATCACGCTCCAATAAGCATTCGTGTGGTGACGCGAGGCTCTAGGTCTATCGTACTGGAATAGATCTGAGCGCATATCATCTAAGTTTAGATAAGCACGTTTGTACTTAGTTCTAACTTCAGAAATCTTCGATGCAGGGAAAACCTGTTTCAAAG; newlen 2, seedlen 238, end 237; iter 25 T1: , 0; new seq ATAGG; iter seq ATAGAAAAATTGTCCTTCTCCTACGTATACAGACAAAGGATAACCCCATGTTGCTCTAATTTTATTCACGTCTCTAGCACAAATTTGTGCTCTGGCAAAACAACATGCATCGGGGAACTCGTCGCTCCTGCCAAATACATAGTCGTACCATTCTCGTTGAATATGCTCCAAAGCTCCTGAATCAAGAGCCTCTCTCTTAGTCCTACATCCGTCGATCTTATAAGGAAGACCGGCT; newlen 5, seedlen 235, end 234; iter 1 T2: , 0 18; iter 12 T1: , 0; new seq A; iter seq CGATAGACCTAGAGCCTCGCGTCACCACACGAATGCTTATTGGAGCGTGATTAACTCCATTCGTGCAGATCTCAAGCCTAAAAACCTTTTGAGACCTCTCACTATGGGTGCCGTGACACAAGCTTCAACCTTTCCAGGTTCAAAATCAGCCGGTCTTCCTTATAAGATCGACGGATGTAGGACTAAGAGAGAGGCTCTTGATTCAGGAGCTTTGGAGCATATTCAACGAGAATGGTACG; newlen 1, seedlen 239, end 239; iter 20 T2: , 0 148; iter 3 T2: , 0 26; iter 0 T2: , 0 45; iter 0 T2: , 0 40; iter 3 T2: , 0 222; iter 3 T2: , 0 27; iter 2 T1: , 0; new seq CAAAATTATTATTGTTGTTAATTATTATTATTGTCGTTGTTTATATGTGTAACTTGTATCCCGCTGCCCGCGATGCTTTACTAATTCTTTTCACTCTACCACCAGCAACTACATGTACTGTAGAAAGATCGTCTAGCACACTACGAAGAGTTAAAACATGGCTAAGATCGACTATGTCAGATGAACGACTGACTCTCTGGCTT; iter seq ATGTATACTGAGCGTGCATCG; newlen 203, seedlen 21, end -1; iter 1 T2: , 0 33; iter 9 T2: , 0 84; iter 2 T2: , 0 20; iter 1 T2: , 0 31; iter 3 T2: , 0 27; iter 29 T2: , 0 66; iter 1 T1: , 0; new seq AGTGCTCAATCAT; iter seq GAGAGAGCAACATTTAATT; newlen 13, seedlen 19, end -1; iter 6 T2: , 0 143; iter 3 T2: , 0 210; iter 10 T1: , 0; new seq AGCTCGGGTGG; iter seq CTACGCGTCGCCAACGTCGCCCGGTTACGGGTAGCGTGCATCGCGTACCAAGCGGCGCATGGACAGTACGGATCAGATCTGCAAG; newlen 11, seedlen 85, end 91; iter 1 T2: , 0 18; iter 12 T2: , 0 127; iter 6 T2: , 0 18; iter 9 T2: , 0 223; iter 3 T1: , 0; new seq CAAAATTATTATTGTTGTTAATTATTATTATTGTCGTTGTTTATATGTGTAACTTGTATCCCGCTGCCCGCGATGCTTTACTAATTCTTTTCACTCTACCACCAGCAACTACATGTACTGTAGAAAGATCGTCTAGCACACTACGAAGAGTTAAAACATGGCTAAGATCGACTATGTCAGATGAACGACTGACTCTCTGGCTT; iter seq ATGTATACTGAGCGTGCATCG; newlen 203, seedlen 21, end -1; iter 1 T2: , 0 47; iter 1 T1: , 0; new seq GTACATTTGGATACGATAACAGGTGTATTGT; iter seq CTAAGATTGTAAGTTCTAATTTGGAGTCGGGATATTTTAATGTGACGGACCAAAACTTTACGACACTCCAAATTGCTGAAAAACTAAAAGAAATTTATCCCAAAATGGAAATGATTTTTACAGATCATCATATTAATCATGGGGACAGTCGAGTCGACAGAGACAGCAGGTTAGAAAAAATAATTTCTCCCAGTATGGAATTAACAAAA; newlen 31, seedlen 209, end 237; iter 5 T2: , 0 18; iter 9 T2: , 0 131; iter 13 T2: , 0 58; iter 4 T2: , 0 53; iter 10 T2: , 0 19; iter 17 T2: , 0 19; iter 9 T2: , 0 239; iter 8 T2: , 0 24; iter 18 T2: , 0 18; iter 12 T2: , 0 41; iter 11 T2: , 0 35; iter 11 T1: , 0; new seq ACTGATCGAACGGG; iter seq AGATAAATGATCTCGTCCTCCGGCTGAAGATGCACGGTTAAAGGCTGCTCCGTTTCGAACCTCTGAAGAACAACCAGGCGTACTGAAGACTCGACTGCGTTTATTCCGCAAACTCGCTTTCGAACCGGACTGATGCGTTTTCGCTATCATCTTATCGGTGCGGTGCAATTTACTGCCTTCATCCTCGATTATCT; newlen 14, seedlen 194, end 207; iter 2 T2: , 0 151; iter 15 T2: , 0 18; iter 2 T2: , 0 18; iter 15 T2: , 0 73; iter 5 T2: , 0 108; iter 4 T2: , 0 27; iter 12 T2: , 0 25; iter 0 T2: , 0 74; iter 1 T2: , 0 73; iter 0 T2: , 0 87; iter 37 T2: , 0 120; iter 8 T1: , 0; new seq ATACTCTGTATAAGAAGCTGGAAGAAT; iter seq TTGCAATCCACTTGAATAATTTGCAATGAAACAAATTAATAAAATCAAGAACTGATCACCAGTTATCACACTCTTGTATTAATTTTACAACTTTGATAATCCGAAGTGTGTTGTATATACTCGAGTAACACTGGAATACTCTAAACGGTACAGTACGTCCAATCAGTGACATGAACATGCGCCGGTTAGTTCGATTTTATATTTTATTGTCAC; newlen 27, seedlen 213, end 238; iter 0 T2: , 0 24; iter 5 T2: , 0 94; iter 0 T2: , 0 18; iter 0 T2: , 0 37; iter 0 T2: , 0 18; iter 9 T2: , 0 18; iter 14 T1: , 0; new seq ACGTATTAGCTTGAAAGACAATTTAGT; iter seq CCTGAACCAGTTTATGACCTGGCTGTATGGCAAGTCTCGTTTGGTTGCTACTTTTGCGCAAAGAGTTCTGATAACGTGTTCTGTCTCCCGGCTAGCACCTCCATACGGCGTGAATACGAGTGGACTGAACGTTCCGTGCTCCAGTTGGATGACTCTCTGGTTGTATTCTCGCTTCTTCTCTCGTTCCTTCGAAGCAAATGCTTTCTCGAGTGA; newlen 27, seedlen 213, end 239; iter 4 T2: , 0 20; iter 6 T2: , 0 50; iter 21 T2: , 0 18; iter 1 T2: , 0 125; iter 2 T2: , 0 26; iter 1 T2: , 0 27; iter 2 T2: , 0 60; iter 6 T2: , 0 51; iter 19 T2: , 0 18; iter 5 T2: , 0 18; iter 17 T2: , 0 192; iter 0 T2: , 0 94; iter 2 T2: , 0 26; iter 2 T2: , 0 126; iter 48 T2: , 0 18; iter 23 T2: , 0 209; iter 3 T2: , 0 223; iter 4 T1: , 0; new seq ACTTTGATAGT; iter seq TCCTGCTCTTCAGGAAAACGCTCGTATAACATATCAGCCGCCAACTGCACTGTATCACAGAGTGGAACGTTAGTGAAGAGACTAACTACATCCAGCGAATACATGCGGTGGTGCGGAGTTACGCCCTGCAGCTCAGCAAGGTCTCTGGCAATTTCACGCGAAGAGAACGGGATCCTTGATTCAGGCAGTGTCTCTAGCTTGCCAGCTATGTAGTCAGCTAGCGGATGGT; newlen 11, seedlen 229, end 239; iter 9 T2: , 0 18; iter 9 T2: , 0 80; iter 0 T2: , 0 62; iter 10 T2: , 0 235; iter 30 T1: , 0; new seq ACCTTTGGCTATAGTATTGCCTGCCACAATGATTGTCGCACCA; iter seq CCATTTACTTTCAATCGCATTATTTGAGAACAGAGAAGTCGGTAATTTGCGCTTGGATCGTCATTTACAACAAACGCCAGAGGTTGAGAATGTAGATAGGCAGCTAAAAAGCTAACTATGTTTTCCACTGGCGGAGCCACTTCAGTTGAACTCAATATGTACTTATCTTCACATTGCTGCTTCAGTCTGTTATTACC; newlen 43, seedlen 197, end 237; iter 21 T2: , 0 239; iter 5 T2: , 0 71; iter 0 T2: , 0 56; iter 7 T2: , 0 27; iter 1 T2: , 0 220; iter 10 T2: , 0 80; iter 12 T2: , 0 25; iter 0 T2: , 0 55; iter 0 T2: , 0 44; iter 1 T2: , 0 18; iter 9 T2: , 0 26; iter 0 T2: , 0 92; iter 8 T2: , 0 159; iter 11 T2: , 0 25; iter 3 T2: , 0 187; iter 5 T2: , 0 18; iter 19 T2: , 0 239; iter 9 T2: , 0 21; iter 13 T2: , 0 41; iter 22 T2: , 0 239; iter 7 T2: , 0 18; iter 7 T2: , 0 175; iter 1 T1: , 0; new seq GTATTACAAACTGCAACTCATTAACATAGGGTTTTCCTAGTATAACGGTCCAGGCCAACGATTATTCAGTGTAATCCTGATTGTTGGCAGAGAGCAGGCACTTTATGTGGTTTTCCAGGTATAGTAATAGATATACCCGGATTAAGTCCC; iter seq CCTGTGTAGGTTCTTGTCAATTTGCCGGTGTTATTGGCTGACTTGCCCTTTATATATA; newlen 150, seedlen 58, end -1; iter 22 T2: , 0 22; iter 6 T2: , 0 223; iter 8 T2: , 0 18; iter 15 T2: , 0 99; iter 3 T2: , 0 28; iter 0 T1: , 0; new seq AAGTACAAACTATT; iter seq GCTATTAGACAGCCAATAATAGGAAATTGTATTTCTCAAGAAAAACAACAATATATACTAAAACACATGATAAGAATTACATAAACACTCATAATTTCAACAGAAAGAAGAAATAGAATAACACGTATACGAAAAATTACAAAACCGTAAAAAACAAAAATAAAACTAATAAAAAATCCCCAAAAAATAAACAA; newlen 14, seedlen 194, end 207; iter 1 T1: , 0; new seq A; iter seq CTTTTAGTAAAATCCAAAAATACACGAAATTGGAGAGCAATAAAAATAACGTATTGACCCTTGGAGAAAAATTGTCAGAATAACGAAATTAATTAAACATGGAAGGTTTCGGCGTCAAGCGCGCCTTTTTCATAGACAATGCACAACGTTTCACAATAATCCCCTAATTAGTAGCACAGGATTGCAAACTAAGTAATTGGATGACGTAGCAGCTGGTGACGTAAGAACTGGGGAAAATT; newlen 1, seedlen 239, end -1; iter 6 T2: , 0 151; iter 10 T2: , 0 112; iter 0 T2: , 0 239; iter 3 T2: , 0 223; iter 3 T2: , 0 23; iter 5 T2: , 0 18; iter 1 T1: , 0; new seq TT; iter seq TTTTTATAAGGAATTTAAAAGCAAAAAAAATTCCTGTAAAAAAGAGGCATGATTAGCACTTACGGTAATCTCCAAGTTTACGACCTCAACGAATAAAACCTCGGACACTGTAACTGAAATATGTCAATATTGAATAAACGTAATCAAATAAACATTCTTTTTAGATCAATCCAATTTCTATATTTTGTACAACTAATGAGCGTAATAATTCATGTATGTAATATAAAAAACAAATTTA; newlen 2, seedlen 238, end 239; iter 6 T2: , 0 18; iter 8 T1: , 0; new seq T; iter seq TTTGTACCGAATAATCGAGATCTATGTACTGATCGGACGAGCAACAAAAATCCGTTAACTAGTCTTTTCTTATTCGGTACCGAATTCCGCCATACAAGCTCATGCATGTACGAAGTGAAAAGGGTCTCCGAAGTACCAAATTGGCGTTTGAGTTTCCTTTGGCCCGCAGTGTGAATGGCTGGGTCATTCGGATCGACAAAATGTTCCTTATGTATAATCACAGCGTTACTATACATACC; newlen 1, seedlen 239, end 239; iter 6 T2: , 0 33; iter 12 T2: , 0 25; iter 0 T1: , 0; new seq AC; iter seq TATACTGGGGTTAGTTTATCTCGAAATTTATTCGTTTAAGCCCCAGTTTTATAGTAGTTAATCTAAACCTGCAGTAATACAATTGTTTTATAACTTGCTATCGCGTAGAATCCGGCGAAAATTACACGTTTGCTTTATCCGCAGCTAGCTATATTCGGTTACTCAAATTTATCGGATAGCTCAGAAAAATGTAGGCTATTGTTCTAACTTCGAACACATTAATTCGCTTCTCTGAGCG; newlen 2, seedlen 238, end 238; iter 6 T2: , 0 237; iter 5 T1: , 0; new seq AGCTCGGGTGG; iter seq CTACGCGTCGCCAACGTCGCCCGGTTACGGGTAGCGTGCATCGCGTACCAAGCGGCGCATGGACAGTACGGATCAGATCTGCAAG; newlen 11, seedlen 85, end 91; iter 1 T2: , 0 39; iter 5 T2: , 0 239; iter 15 T2: , 0 116; iter 2 T1: , 0; new seq AG; iter seq AAGCGAATTAATGTGTTCGAAGTTAGAACAATAGCCTACATTTTTCTGAGCTATCCGATAAATTTGAGTAACCGAATATAGCTAGCTGCGGATAAAGCAAACGTGTAATTTTCGCCGGATTCTACGCGATAGCAAGTTATAAAACAATTGTATTACTGCAGGTTTAGATTAACTACTATAAAACTGGGGCTTAAACGAATAAATTTCGAGATAAACTAACCCCAGTATAGTTAATAAT; newlen 2, seedlen 238, end 237; iter 50 T2: , 0 72; iter 9 T2: , 0 112; iter 8 T2: , 0 41; iter 1 T1: , 0; new seq AGA; iter seq TAGGAATTTCTACCAGCTCAAAATACCCAGAGATAGATTCTGAGGGAGAGCGAAATGAAGACAGAGAGCCCGAAGAATTCAGCCTATTTTCTGGAACGATACATAACCCCGTTACGAAGGAGACAGTAGCGGAGGTCAAAGTTTCAGCAATGGAGCGAAGGGATAGGGAGAGAGTCGAGGAGGAAGACAGTAAATTGCACCGCACCGATAAGACGATAGCGAAAACGCATCAGTCCA; newlen 3, seedlen 237, end 239; iter 17 T2: , 0 51; iter 34 T2: , 0 26; iter 0 T2: , 0 79; iter 1 T2: , 0 24; iter 16 T2: , 0 18; iter 6 T1: , 0; new seq ATGTTAAT; iter seq GTCGTCTGGTCCTTTTGATTGTTTGTTGGGTGTCTTTTTAATGGCTGTTGTGACCTCTTCGTCTGTTAGTGTGATTGTACTATTTGGGAGATTTTTTATGTGTTCTTGGATTTTTCGGTTGTTTGTGTTCGTTTTGTGTTGTATTATGTTCGTGTATTGTTTGGTAAAAGCTTGGGCTGTGCCGTGTGGCGTGGTGCTGATTTTGTTTAGGAATGTGATGGATCTGTTTTGA; newlen 8, seedlen 232, end 136; iter 24 T2: , 0 41; iter 1 T2: , 0 18; iter 15 T2: , 0 195; iter 15 T2: , 0 119; iter 2 T1: , 0; new seq TATTGT; iter seq GCGATTTCTTTGTATTCTTTATCAAATATCGACAAAGTAGTAGCGCTTTGACGAGTAGGTTTAGCCATGGTCACGAGACAGTATAGTTAGGACAGTACATCGACAATATAGACAGTCTAGACATTTAATACACAGTTAAATATATGGTTAGTTCGGTTAGTTTATGCCGGTATTGTAATGTTAGATAGTTACCTGGAACGATACGAAAGTCACACACATTAAATTAAATTGGCA; newlen 6, seedlen 234, end 232; iter 5 T1: , 0; new seq TGGAAAATACTACAAAGCCCTGTGGTT; iter seq TCTATGGCCATTTTTGATGCTGATCACGAATCTGGCAAAATGTTTGACCTCCGATGACCTTTGACCCCCGTAACGCTCACTTTTTCAAAATGGCCGCCAAAATTGTTGATTTTGCCAGATTTCGTGTTTCTGACGCAATGATATGGGTTGTAATACCTCTATTTATAGGTTTTCTAGCCCTCTAAATGCAAATTTGACATCATTTTACAGATC; newlen 27, seedlen 213, end 226; iter 7 T2: , 0 137; iter 2 T2: , 0 43; iter 9 T2: , 0 48; iter 3 T2: , 0 94; iter 2 T2: , 0 135; iter 7 T2: , 0 47; iter 3 T2: , 0 84; iter 5 T2: , 0 239; iter 2 T2: , 0 18; iter 15 T1: , 0; new seq T; iter seq CAGGAAGAAGAATCTTCCTTAGCCGTCTTCCCCTCCCACTGACACCCAATAGCTTGTAAAGTTCACTAGCAATGCTCTTAGCCATCCTTTCATAATGATCCAAGCATTCAGGTCCCTGCGCACCCTTGAAATCCTTTGCCAAACAAAATTTACGCGAATCACCCAAGATGTCATTGATCTTACTTAAATATGACTCCCGCCGCATAACACAATACCCGGTTCCTTTGTCAAAAGGAACC; newlen 1, seedlen 239, end -1; iter 3 T2: , 0 36; iter 9 T2: , 0 77; iter 14 T2: , 0 127; iter 1 T2: , 0 47; iter 27 T2: , 0 32; iter 4 T1: , 0; new seq AATAAGAAATTAC; iter seq TCAAATTCAGGAACGGCATTCAAGTAACTTAAATATGGCACATAGTAGGAGCGAAGTAGATAGCTGCGTGGAGACCTTTTTGGATAACCCTCAGGTAAGTAGTCTGGGATCCTACAAAAAGGATATACTAATAGCTATAGCAGCTAAGCTAAATATACCCCTAGTTAAAACTAACACTAAGGCGGATATAATACAGCTAATAGTAACGGAGGTAGAAAAGGGTAATC; newlen 13, seedlen 227, end 237; iter 20 T1: , 0; new seq A; iter seq TTGGGTCTTTAATCCAAGTGATGTGAATAAAGACTGCATTTTCTCAGTCAGGCCATTATCCATCCATGTAGAACTCAGGTTGAACGCATCCGAGATTGCCGAACATCCAAACAATATTATAAACCAAATAAATTAAAATAAATATAAGGTTATTGCACAATATAATCACTAAGGAAAAATACCCTAATCGCTCTAGGTTCCTTATGATAGACAAATTGCACATTGCCGTTTTCAGTTAT; newlen 1, seedlen 239, end 228; iter 12 T2: , 0 18; iter 3 T1: , 0; new seq TAATAAGAGAGGGG; iter seq ATATTGAAACTGGACCATACTACGAGGCTAGGTGGTAGAACCAAAGTGGTATGCATATGCAGAGACAACTCCGAGGTGCAAAGGCACAAAATTAAGATAGAGGTACTAGGGGAAGTGAAAACGATGGAAGTAGGTATTATGAAACCGCTCCACCCCAGCTATGATCTAATCTTATCAGATGATATTCTCGGCAAAAGATTGCAAGATTTCCTTCTAGGAAGAAAGG; newlen 14, seedlen 226, end 215; iter 8 T2: , 0 110; iter 39 T2: , 0 18; iter 2 T2: , 0 18; iter 11 T2: , 0 104; iter 12 T1: , 0; new seq ACAAACATTCAACG; iter seq TGTCATGTGACATGTATTAAAACTTGTTATTGCAATAAGGAAGTGAAAATGTTTTTGTGGTTTTGTTGAATACATTATACCATCATTAGACGTCATCTGTGTGTAAAAACAAAACGTTTAAAAACACGTAACGCATTAAAAATTTGTTACCACATAAAGTAATACATTGAAAACATGTTTATCGAAGTGTTATGTTTGTCTATATTGGGTTTTATAAAAATATTTT; newlen 14, seedlen 226, end 239; iter 11 T2: , 0 239; iter 14 T2: , 0 164; iter 0 T2: , 0 222; iter 2 T2: , 0 18; iter 23 T2: , 0 101; iter 14 T2: , 0 223; iter 1 T2: , 0 137; iter 3 T2: , 0 111; iter 21 T2: , 0 42; iter 14 T1: , 0; new seq ACCGAAGGATTATCCGCGAGAAGGCGAACCTGATTCGCAGTTGCATTATTGCTACACCTTCT; iter seq AGCAATAGCGGTTATAATACGGCTATAAGCCAAAGTAGTTTTTTTTTTGTTAGTTTGTTCCCTTTTGCTAAAAAGCTTAAATTTGAGTAACCGAATATAGCTAGCTGCGGATAAAGAAGACGCGTAATTATACGCGATAACAAATCATAAACGATTATATTACCGCAAATTGAGATCG; newlen 62, seedlen 178, end 235; iter 17 T2: , 0 18; iter 6 T2: , 0 34; iter 4 Segmentation fault

Any help is welcomed,

Thanks

dhlbh commented 7 years ago

faguil, I need to make sure two things:

  1. all the reads have the same length.
  2. "N" or other non "ACGT" alleles are not existing in reads.
faguil commented 7 years ago

Hi bh,

My answers below:

  1. All reads are 251bp length.
  2. According to FastQC results, there are some reads containing "Ns"

Is the second point an issue for BASE assembler?