Open ppericard opened 7 years ago
One additional heuristic: deal with reads sorted by position and for each read only consider the next x (50, 100, 1000) reads to be paired with. This reduce the complexity from an amortized quadratic to a linear one with a capped factor O(xn) with n the nb of reads.
However, this could drastically reduce the number of edges in the overlap graph. Therefore, the impact of this heuristic has to be evaluated.
Change how reads pairs are sorted through by reading SAM file reference by reference and using a positions window gliding on each reference.
One major problem: How to store read pairs on which we already know they overlap so we dont deal with them again later ?
Parallelize treatment: at the reference level, but also at the reads pair level.