bcgsc / ntEdit

✏️ Genome assembly polishing & SNV detection
GNU General Public License v3.0
64 stars 9 forks source link

Can ntEdit remove haplotype redundance? #15

Closed xiekunwhy closed 4 years ago

xiekunwhy commented 4 years ago

Hi,

"ntEdit simplifies polishing and haploidization", what haploidization means here, can ntEdit use for haplotype redundance removing, I used wtdgb2 to assemble a high heterozygous rate(may be >1.5%) plant genome, the expected genome size is ~3.2G, but I got ~4.1G assembly sequence, too many haplotype redundance there, may be. So I want to find some tools to remove the redundances.

Best wishes, Kun

warrenlr commented 4 years ago

hi Kun,

thank you for your message. ntEdit is not the tool you are looking for.

It is a sequence assembly polishing tool, and as described in our manuscript can be use to turn a pseudohaploid genome assembly into it's haploid counterpart (if you have a haploid tissue source, like seeds). We now also support SNV reporting but it does not collapse haplotype redundancy.

If you had haploid read source, you could try turning your redbean/wtdbg2 assembly into haploid and then use tools like cdHit to cluster similar regions. Or run cdHit (or similar tool) as-is (with low stringency parameters).

Best, Rene

lcoombe commented 4 years ago

Hi Kun, Another suggestion for a tool to remove redundancy due to high heterozygosity is purge_haplotigs.

xiekunwhy commented 4 years ago

Hi Rene, I will try purge_haplotigs and some other tools. Thank you for your suggestion. Best, Kun