Open davidbenjamin opened 5 years ago
I will start working on this off of the work that currently resides in #6034. The proposal will be to perform KBestHaplotype finding for multiple source/sink vertexes and then perform smith waterman on the resulting "dangling" haplotypes that are created in order to recover the probable dangling sequence. Hopefully the number of haplotypes will have been brought down by enough that this operation will be tolerable in terms of cost.
Currently we have fancy code to add artificial edges to the assembly graph in order to merge dangling paths back into the reference. This requires a lot of code and is hard to understand. It may be better to find haplotypes from a non-modified graph (we would need to be sure that the best haplotype finder doesn't reward dangling paths just for being short) and then pad the discovered haplotypes to occupy the same reference span.