Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
492 stars 162 forks source link

finding all alignments within the string #202

Open mehdiborji opened 2 years ago

mehdiborji commented 2 years ago

Is there a way to report all alignments up to the distance requested? For example if I have an alignment with distance of 1 in the beginning of a query I might prefer that to an alignment with distance of 0 somewhere else.

The solution I can think of is to have a moving window across the query; however, I believe these suboptimal alignments are already found but just not reported. Something like this would be very useful for trimming trailing or leading strings from a longer one.

ewachtel commented 2 years ago

All suboptimal alignments are reported up to the specified distance criteria in the arguments described in the help section. Note that if a query is very long, shorter alignments that have a high probability of occurring randomly will not be reported.

Regards,

Ed

Ed Wachtel (917) 573-7591

On Fri, Mar 18, 2022 at 10:19 PM Mehdi Borji @.***> wrote:

Is there a way to report all alignments up to the distance requested? For example if I have an alignment with distance of 1 in the beginning of a query I prefer than to an alignment with distance of 0 somewhere else. The solution I can think of is have moving window across the query however I believe these suboptimal alignments are already found but just not reported. Something like would be very useful for trimming trailing or leading strings from a longer one.

— Reply to this email directly, view it on GitHub https://github.com/Martinsos/edlib/issues/202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGITCHBCJTEEDSJSOSQI7ELVAU2SLANCNFSM5RDKXKLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

mehdiborji commented 2 years ago

Thank you @ewachtel for your response. Is this feature also part of the python wrapper? Here's an example where I introduce one mismatch in my query and edlib fails to report it because a perfect alignment exists elsewhere:

C='GTGTGCTCTTCCGATCT'
V='TCTTCAGCGTTCCCGAGA'
my_string=2*C+V+C+'A'+V[1:]+C
edlib.align(V, my_string,'HW','locations',1)
{'editDistance': 0,
 'alphabetLength': 4,
 'locations': [(34, 51)],
 'cigar': None}
ewachtel commented 1 year ago

I was thinking of a different project. Actually, I am not the right person to discuss edlib (I did not author it)

Ed Wachtel (917) 573-7591

On Sun, Mar 20, 2022 at 2:16 PM Mehdi Borji @.***> wrote:

Thank you @ewachtel https://github.com/ewachtel for your response. Is this feature also part of the python wrapper? Here's an example where I introduce one mismatch in my query and edlib fails to report it because a perfect alignment exists elsewhere:

C='GTGTGCTCTTCCGATCT' V='TCTTCAGCGTTCCCGAGA' my_string=2*C+V+C+'A'+V[1:]+C edlib.align(V, my_string,'HW','locations',1)

{'editDistance': 0, 'alphabetLength': 4, 'locations': [(34, 51)], 'cigar': None}

— Reply to this email directly, view it on GitHub https://github.com/Martinsos/edlib/issues/202#issuecomment-1073304857, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGITCHAO5MDA3WVVJGLJ5QLVA5TRBANCNFSM5RDKXKLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

cherryamme commented 1 year ago

Did you find anyway to solve this problem? @mehdiborji I want result to report all possible site too

Martinsos commented 1 year ago

Hey @cherryamme @mehdiborji ! So edlib (C version) does return multiple locations: either just their ends, or their ends + starts, depends on what you tell it to do. When returning alignment, it does it only for one of those however. It has been some time since I have worked on edlib so I can't remember 100%, but I believe alignments for other end locations could probably also be obtained. However Edlib doesn't do that at the moment, and it would require some changes to the codebase. I currently don't have time to work on this, but if somebody wants to take it on I can try to help.