Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
369 stars 53 forks source link

Genome assembly of the autotetraploid plant #136

Open fengyuanli304 opened 2 years ago

fengyuanli304 commented 2 years ago

Question or Expected behavior Genome assembly of the autotetraploid plant Hi, I have recently assembled a genome using nanopore reads, but the results are not good. It may result from its autotetraploidy. Could you please give me some suggestions? Thank you very much. 1) nextdenovo + nextpolish Type Length (bp) Count (#) N10 3398628 14 N20 2568560 32 N30 1871279 58 N40 1356392 91 N50 939240 138 N60 640978 207 N70 451803 308 N80 316553 449 N90 218120 654 Min. 33288 - Max. 5440223 - Ave. 528507 - Total 538020557 1018

C:96.7%[S:72.4%,D:24.3%],F:0.7%,M:2.6%,n:1375
1329 Complete BUSCOs (C)
995 Complete and single-copy BUSCOs (S)
334 Complete and duplicated BUSCOs (D)
10 Fragmented BUSCOs (F)
36 Missing BUSCOs (M)
1375 Total BUSCO groups searched 2) wtdbg2 + nextpolish Type Length (bp) Count (#) N10 7117562 7 N20 5320460 16 N30 3664794 29 N40 1774681 51 N50 816375 95 N60 316343 206 N70 151485 470 N80 79657 986 N90 35990 2038 Min. 2066 - Max. 11154505 - Ave. 101844 - Total 560244256 5501

C:90.9%[S:80.6%,D:10.3%],F:1.7%,M:7.4%,n:1375
1250 Complete BUSCOs (C)
1108 Complete and single-copy BUSCOs (S)
142 Complete and duplicated BUSCOs (D)
24 Fragmented BUSCOs (F)
101 Missing BUSCOs (M)
1375 Total BUSCO groups searched

3) necat4 + nextpolish Type Length (bp) Count (#) N10 1255211 77 N20 940996 191 N30 758574 333 N40 617986 509 N50 508059 724 N60 416166 985 N70 319953 1315 N80 236195 1754 N90 146935 2391 Min. 504 - Max. 2499020 - Ave. 288546 - Total 1204105988 4173

C:97.0%[S:27.5%,D:69.5%],F:0.9%,M:2.1%,n:1375
1333 Complete BUSCOs (C)
378 Complete and single-copy BUSCOs (S)
955 Complete and duplicated BUSCOs (D)
13 Fragmented BUSCOs (F)
29 Missing BUSCOs (M)
1375 Total BUSCO groups searched

4) canu (corrected) +smartdenovo +nextpolish Type Length (bp) Count (#) N10 2180472 18 N20 1526697 47 N30 1236412 85 N40 941656 133 N50 780034 193 N60 584817 269 N70 442020 371 N80 288149 518 N90 150110 760 Min. 9906 - Max. 4119799 - Ave. 367003 - Total 515272840 1404

C:96.9%[S:73.6%,D:23.3%],F:0.5%,M:2.6%,n:1375
1332 Complete BUSCOs (C)
1012 Complete and single-copy BUSCOs (S)
320 Complete and duplicated BUSCOs (D)
7 Fragmented BUSCOs (F)
36 Missing BUSCOs (M)
1375 Total BUSCO groups searched Which one is more suitable for this autotetraploid plant? I choose the result of nextdenovo + nextpolish for downstream analyses. After removing haplotigs and contig overlaps by purge_dups, I got a smaller genome size.

contigs 598

Largest contig 5440223
Total length 389747736 GC (%) 34.54
N50 1536005
N75 638325
L50 77
L75 176

N's per 100 kbp 0.22

C:92.8%[S:81.2%,D:11.6%],F:1.8%,M:5.4%,n:1375
1275 Complete BUSCOs (C)
1116 Complete and single-copy BUSCOs (S)
159 Complete and duplicated BUSCOs (D)
25 Fragmented BUSCOs (F)
75 Missing BUSCOs (M)
1375 Total BUSCO groups searched How can I improve this result? Waiting for your reply. Operating system CentOS Linux release 7.6.1810

GCC What version of GCC are you using? 4.8.5 20150623 (Red Hat 4.8.5-36)

Python What version of Python are you using? python2.7.18

NextDenovo What version of NextDenovo are you using? nextdenovo2.4

Additional context (Optional) Add any other context about the problem here.

moold commented 2 years ago

What is your question? smaller assembly size? see FAQ