Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
352 stars 52 forks source link

assemble result #44

Closed shehongbing closed 4 years ago

shehongbing commented 4 years ago

when I used raw Nanopore data, the assemble.fa N50 is about 39 Mb, but the alignment is not good in comparison to published genome. however, when I used collected data (corrected by canu), the assemble.fa N50 is about 2 Mb, and with the good alignment in comparison to the polished genome. So I do not know why. it suggested that should I used the corrected data rather than raw data? and the two methods with the huge different in N50

moold commented 4 years ago

What is the NextDenovo version you used? and how about the genome size, heterozygous rate, and repeat content? I do not suggested use corrected data, because NextDenovo will correct the raw data and filter some low quality or unuseful seeds. BTW, Could you provide the co-line pictures?

moold commented 4 years ago

How about the assembly result using canu?

shehongbing commented 4 years ago
  1. I used the NextDenovo v2.1-beta.0
  2. The genome size I used is about 980 Mb, heterozygous rate is 0.119%

The figure 1 is raw data, figure 2 is corrected data

Figure1

Figure 2

在 2019年12月26日,下午7:01,Hu Jiang notifications@github.com 写道:

What is the NextDenovo version you used? and how about the genome size, heterozygous rate, and repeat content? I do not suggested use corrected data, because NextDenovo will correct the raw data and filter some low quality or unuseful seeds. BTW, Could you provide the co-line pictures?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Nextomics/NextDenovo/issues/44?email_source=notifications&email_token=ALZETBNF2HJIWH7TZQUML4TQ2SFHZA5CNFSM4J7KD762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVNK5A#issuecomment-569038196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALZETBMR2MG4UGB7WKEE5WTQ2SFHZANCNFSM4J7KD76Q.

moold commented 4 years ago

I can not see the figures, if you have problem to upload the figures to github, you can send the figures to my email: huj_at_grandomics.com. BTW, could you provide the configure file and the assembly log (the last step log)?

shehongbing commented 4 years ago

Hi, Dr. Hu

I sent it to you few minutes ago

在 2019年12月26日,下午8:04,Hu Jiang notifications@github.com 写道:

I can not see the figures, if you have problem to upload the figures to github, you can send the figures to my email: huj_at_grandomics.com. BTW, could you provide the configure file and the assembly log (the last step log)?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Nextomics/NextDenovo/issues/44?email_source=notifications&email_token=ALZETBLPU54U4V34GS4WXC3Q2SMTTA5CNFSM4J7KD762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVPR2Y#issuecomment-569047275, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALZETBMUVGKKBLKSU5TAUUDQ2SMTTANCNFSM4J7KD76Q.

moold commented 4 years ago

I find your seed_cutoff is very short? How many data you used to do the assembly?

shehongbing commented 4 years ago

My raw data is about 40 G.

[Read length stat] Types Count (#) Length (bp) N10 47170 73938 N20 110964 59801 N30 187662 50562 N40 277752 43236 N50 382891 37077 N60 505656 31670 N70 650141 26749 N80 823069 22065 N90 1037363 17256

Types Count (#) Bases (bp) Depth (X) Raw 1503768 42060488746 42.92 Filtered 0 0 0.00 Clean 1503768 42060488746 42.92

*Suggested length cutoff of reads (genome size: 980000000, expected seed depth: 40) to be corrected: 15170 bp

在 2019年12月26日,下午9:02,Hu Jiang notifications@github.com 写道:

I find your seed_cutoff is very short? How many data you used to do the assembly?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Nextomics/NextDenovo/issues/44?email_source=notifications&email_token=ALZETBJ6QQTQCIE43NWLEDLQ2STO5A5CNFSM4J7KD762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVROJY#issuecomment-569055015, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALZETBL5JAEYSBAHRL43BIDQ2STO5ANCNFSM4J7KD76Q.

moold commented 4 years ago

Your data is not enough for assembly using the currently version of NextDenovo with default options, because all default options are optimize with 60-100x nanopore data. So it will produce an unexpected assembly result. But if you still want to use NextDenovo to do the assembly, you can try to use the option correction_options = -b and change -k 30 in sort_options and than rerun all pipeline, while I can not guarantee you can get a good result. You also can try to other assemblers. I will release a set of preset parameters for assembly with low-depth data in the future.

shehongbing commented 4 years ago

Thank you

在 2019年12月26日,下午9:22,Hu Jiang notifications@github.com 写道:

Your data is not enough for assembly using the currently version of NextDenovo with default options, because all default options are optimize with 60-100x nanopore data. So it will produce an unexpected assembly result. But if you still want to use NextDenovo to do the assembly, you can try to use the option correction_options = -b and change -k 30 in sort_options and than rerun all pipeline, while I can not guarantee you can get a good result. You also can try to other assemblers. I will release a set of preset parameters for assembly with low-depth data in the future.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Nextomics/NextDenovo/issues/44?email_source=notifications&email_token=ALZETBLELTX2VWJ67LNMJ2TQ2SVY5A5CNFSM4J7KD762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVSECI#issuecomment-569057801, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALZETBPCIONGOTH25GOATVLQ2SVY5ANCNFSM4J7KD76Q.