more raw data used to assembly

Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

GNU General Public License v3.0

352 stars 52 forks source link

more raw data used to assembly #60

Closed DengHAU closed 4 years ago

DengHAU commented 4 years ago

Hi，Dr. Hu， If I want to use more raw data in genome assembly, is it right to set : seq_stat -d 90 -d expected seed depth, used to be corrected, default: 45 to get a 90X genome coverage raw data using for assembly? or any other parameters like --asm_coverage ? I had got a genome assembly using nextdenovo with default parmeters, it did a great job!

Thanks !

moold commented 4 years ago

Hi, yes, but, generally speaking, 45x seed is enough. BTW, All data will be used for assembly, but only the longest 45x data will be selected as seeds.

DengHAU commented 4 years ago

Thanks very much for your kindly reply! I want to use nextdenovo to do a genome assembly on a test data of a dikaryotic fungus genome, since 45X seed is enough for a homozygous genome in generally. So may be it is helpful to increase the default seed coverage, any suggestions? BTW, since nextdenovo works so well, any idea to add a "haplotype phase" model to layout hap based genome, it's would be a wondeful function!

moold commented 4 years ago

Hi, I'm not sure, it depends on the heterozygosity rate, maybe you can try it. At present, we are working to improve the assembly accuracy. For the haplotype assembly, if the raw reads are not divided into two haplotypes, such as canu-trio, it will be a tricky problem. Only using mixed long reads to do the assembly, even if we get two haplotypes, each haplotype is also a mixed assembly (including two haplotype sequences), I am not sure if this makes sense. Anyway, we plan to use other data, such as Hic, to lay out the hap-based genome, rather than using long reads only, and it needs more time.

DengHAU commented 4 years ago

Gotta it. Thanks a lot !