ekg / edyeet

base-accurate DNA sequence alignments using edlib and mashmap2
MIT License
33 stars 3 forks source link

wflign for long global sequence alignment #7

Closed ekg closed 3 years ago

ekg commented 3 years ago

This update changes the alignment model in edyeet to wflign. At a low level, edlib is still used to derive base-level alignments.

In result, the alignment of long sequences of many megabases is now possible in low memory and time.

wflign is tolerant of relatively high divergence over the full sequence length, which results in a higher fraction of alignments. For a given setting of -p and -a, we derive more final alignments at the chosen -p threshold.

The meaning of -a[%], --align-pct-id=[%] changes. Now, it is a simple filter on the output alignments. The -p[%], --map-pct-id=[%] parameter and new wflign reduction parameters structure the alignment. It continues to be set by default to be equal to -p, which may be overly restrictive.

todo:

subwaystation commented 3 years ago

oh yeah :fire: looking forward to see this in pggb :)

ekg commented 3 years ago

There is an other change that will be noted. Alignments are now emitted in small chunks. This is a feature of the wflign algorithm.

ekg commented 3 years ago

And, we now build with cmake (see README).