bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

Ovgraphbuild major overhaul: Improve implementation speed ++ #10

Open ppericard opened 7 years ago

ppericard commented 7 years ago

Change how reads pairs are sorted through by reading SAM file reference by reference and using a positions window gliding on each reference.

One major problem: How to store read pairs on which we already know they overlap so we dont deal with them again later ?

Parallelize treatment: at the reference level, but also at the reads pair level.

ppericard commented 7 years ago

One additional heuristic: deal with reads sorted by position and for each read only consider the next x (50, 100, 1000) reads to be paired with. This reduce the complexity from an amortized quadratic to a linear one with a capped factor O(xn) with n the nb of reads.

However, this could drastically reduce the number of edges in the overlap graph. Therefore, the impact of this heuristic has to be evaluated.