a-ludi / dentist

Close assembly gaps using long-reads at high accuracy.
https://a-ludi.github.io/dentist/
MIT License
47 stars 6 forks source link

Adding haplotigs back for gap filling? #20

Closed rotoke closed 3 years ago

rotoke commented 3 years ago

Dear Arne,

Thank you again for your help with setting up dentist on our cluster. I have one more conceptual question before giving it a try which may be better asked in a separate thread:

I am trying to use dentist to fill gaps in a PacBio assembly after Hi-C scaffolding. The sequenced plant was highly heterozygous, and the initial assembly contained both contigs and haplotigs, which represent alternate haplotypes at heterozygous loci. I purged all the haplotigs before scaffolding, so the current assembly is a haploid representation, while the raw PacBio reads to be mapped for gap filling are obviously from the diploid genome.

I am thus wondering whether this could lead to incorrect gap filling if reads from both haplotypes map to the same assembly gap. Would it be best to add the haplotigs again for gap filling?

I assume that the haplotigs could be removed again after gap filling as long as the option to merge contigs is not enabled?

Best, Roman

a-ludi commented 3 years ago

Hi Roman,

you can safely keep the haplotigs out of the assembly. DENTIST does not rely on having a single best mapping for every read but rather considers every chain of local alignments as a possibly true origin. Including the haplotigs could still avoid haplotype errors but at the expense of fewer closed gaps.

In any case, the closed gaps may contain a mix of both/several alleles because DENTIST is not able to distinguish between them. I have ideas how to improve upon this but sadly not the time to do it. However, the inserted sequence will likely contain only base pair level errors that are expected to vanish after polishing but the resulting contig may jump between alleles.

I think you will be able to manually inspect the result because from my experience there are only a handful of gaps that can be closed in de novo assemblies. The locations of the gaps are noted in the generated BED and AGP files.

I assume that the haplotigs could be removed again after gap filling as long as the option to merge contigs is not enabled?

Yes, that would be possible.

a-ludi commented 3 years ago

PS: you can always try both options and choose later. :smiley:

rotoke commented 3 years ago

Super, thanks for the insightful answer! The haplotype jumps are unfortunate but the assembly itself is a mosaic of the two haplotypes anyways due to the high heterozygosity levels. I will try both options and let you know the results.