Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Add random tests for custom equality #88

Open Martinsos opened 6 years ago

Martinsos commented 6 years ago

I implemented custom equality but had no time to implement full blown random tests for it. I should upgrade brute force implementation so it can also work with custom equality and then generate many random tests in order to test that custom equality is working correctly.

nextgenusfs commented 6 years ago

For degenerate nucleotide alignment, I'm doing the following in python which seems to work. Thanks for implementing the feature. I posted this in case it helps you with any tests, etc.

import edlib

#create a list of tuples with degenerate matches
degenNuc = [("R", "A"), ("R", "G"), 
            ("M", "A"), ("M", "C"),
            ("W", "A"), ("W", "T"),
            ("S", "C"), ("S", "G"),
            ("Y", "C"), ("Y", "T"),
            ("K", "G"), ("K", "T"),
            ("V", "A"), ("V", "C"), ("V", "G"),
            ("H", "A"), ("H", "C"), ("H", "T"),
            ("D", "A"), ("D", "G"), ("D", "T"),
            ("B", "C"), ("B", "G"), ("B", "T"),
            ("N", "G"), ("N", "A"), ("N", "T"), ("N", "C"),
            ("X", "G"), ("X", "A"), ("X", "T"), ("X", "C")]

FwdPrimer = 'AGTGARTCATCGAATCTTTG'
Seq1 = 'AGTGAGTCATCGAATCTTTG'
Seq2 = 'AGTGAATCATCGAATCTTTG'
Seq3 = 'AGTGACTCATCGAATCTTTG'
seq1_align = edlib.align(FwdPrimer, Seq1, mode="HW", k=2, additionalEqualities=degenNuc)
seq2_align =edlib.align(FwdPrimer, Seq2, mode="HW", k=2, additionalEqualities=degenNuc)
seq3_align =edlib.align(FwdPrimer, Seq3, mode="HW", k=2, additionalEqualities=degenNuc)

>>> print seq1_align
{'editDistance': 0, 'cigar': None, 'locations': [(None, 19)], 'alphabetLength': 5}
>>> print seq2_align
{'editDistance': 0, 'cigar': None, 'locations': [(None, 19)], 'alphabetLength': 5}
>>> print seq3_align
{'editDistance': 1, 'cigar': None, 'locations': [(None, 19)], 'alphabetLength': 5}
Martinsos commented 6 years ago

Awesome thanks :).