baku4 / sigalign

A Similarity-Guided Alignment Algorithm
MIT License
26 stars 4 forks source link

question on align identity > 90% #12

Open jianshu93 opened 2 months ago

jianshu93 commented 2 months ago

Hi Sigalign team,

Thanks for the amazing library for sequence alignment. I have a question related to aligning high similar sequences against a reference. If the read is always > some certain threshold e.g., 90% to the reference genomes/sequences, would it still be accurate compared to heuristics such as minimap2 (what is the minimum threshold for sigalign to be accurate). A second question is whehter it provide overlap alignment (that is gaps extended at both ends of query and reference are not penalized), like those in usearch/vsearch semi-global alignment.

Thanks,

Jianshu

baku4 commented 2 months ago

Hi Jianshu! Thank you for your interest in SigAlign.

Question 1: accuracy in highly similar sequences

1) On "high" similarity

For SigAlign, "high" similarity generally means around 98-99%, not >90%. SigAlign is particularly suited for sequences with high accuracy, such as those from Illumina NovaSeq (99.86%) and MiSeq (98.79%) (calculated from "read mapping" tests in SigAlign's paper).

This doesn’t mean SigAlign only finds sequences with >98% identity. In the paper's tests, SigAlign can detect <97% identity for MiSeq data, but it generally performs best with targets in the >98% range.

2) On accuracy

SigAlign is non-heuristic, meaning it:

However, SigAlign doesn’t guarantee finding the "biological truth" (e.g., exact origin of a read), so we can't claim it is more accurate than other aligners like minimap2.

That said, our paper shows SigAlign demonstrates high sensitivity and precision (when filtering for the highest-scoring alignments) with simulated MiSeq data.

3) Use case suggestions

SigAlign is expected to perform well with:

Question 2: Semi-global alignment

Yes, SigAlign supports semi-global alignment. (Semi-global mode may offer lower memory usage and faster speeds compared to local alignment)