Closed pb-jchin closed 6 years ago
DBdust could be used to mask low entropy sections, so at least they would not contribute to daligner overlaps.
TANmask could be used to mask tandem repeats. I've added DAMASKER to FALCON-integrate and made it available in our internal mobs build too.
DBdust won't help for the short term as we need the SAM/BAM infrastructure for phasing work. Where the TANmask code? I need to take a look before concluding whether it could help or not.
ok. the problem is that we can't use the Daligner
and DAZZ_DB
for this yes. We do need something like raw dust
masking code for quick (on-fly) detection. I will need to go through i with you some time next explaining the problem better,
Any updates on it? Or ngmlr could maybe be used because it might be faster?
We always run DBdust now
. That might help. If you go to the dazzlerblog, you can learn how to analyze the "dust track" to see how much as been masked.
We are also replacing blasr, hopefully within a few weeks.
Some contigs are mostly simple repeats. The seeding and filling algorithm used in BLASR has trouble to align the reads to those contigs efficiently. It might make sense to detect shorter contig with low entropy in sequence context and not trying to do phasing on those contigs.