ArtPoon / gotoh2

Lightweight and customizable Python/C extension for pairwise alignment of genetic sequences using the Gotoh algorithm
GNU Affero General Public License v3.0
5 stars 2 forks source link

Alignment sometimes fails when one sequence contains the other #12

Closed donkirkby closed 7 years ago

donkirkby commented 7 years ago

I would expect that it should always be able to align one string against its substring. Even weirder, it's successful against a longer string.

Here's a script to reproduce the failure:

from gotoh2.aligner import Aligner

GAP_OPEN_COST = 10
V3LOOP_REF = ('TGCACAAGACCCAACAACAATACAAGAAAAAGTATAAGGATAGGACCAGGACAAGCAT'
              'TCTATGCAACAGGAGACATAGTAGGAGATATAAGACAGGCACATTGT')

def main():
    seed_ref = ''.join([
        "GTACCCCACTCTGTGTTACTCCAAACTGCACGAATGATATCCGTACTACTGCTAACAGTACTAAG",
        "AACAACAGTAGTATTAGTAAAGAAATGATGAGTTGTTCTTTCAATATGACCACAGAAGTAAGAGA",
        "TAAGAAAGAGAAGGTAAATGCACTTTTTTATAAACTTGATATAGTACCACTTAATATTAGTTCGG",
        "GTAATAATAATAGCTCTGATGATAATAACAGTTCTGGTAAATATTATAGGTTAATAAATTGTAAT",
        "ACCTCAGCCGTAACACAGGCCTGTCCAAAAGTCTCTTTTGACCCAATTCCTATACATTATTGTGC",
        "TCCAGCGGGTTATGCGATTCTAAAGTGTAATAATAAGACCTTCAATGGAACAGGACCATGCAATA",
        "ATGTCAGCACAGTACAATGTACACATGGAATTAAACCAGTGGTATCGACTCAACTACTGTTAAAT",
        "GGTAGTCTAGCAGAAGAAGAAATAATAATTAGATCTCAAAATATAACAGACAATGTCAAAACAAT",
        "AATAGTACATCTTAATGAATCTGTAGAAATTAATTGCACAAGACCCAACAACAATACAAGAAAAA",
        "GTATAAGGATAGGACCAGGACAAGCATTCTATGCAACAGGAGACATAGTAGGAGATATAAGACAG",
        "GCACATTGTAACATTAGTAGAACAGCATGGAACAAAACCTTACAAAGAGTAAGTAAAAGATTATC",
        "AGAGTACTTCCCTAATAAAACAATAAAATTTGAAAGACACTCAGGAGGAGACCTAGAAATTACAA",
        "CACATAGCTTTAATTGTAGAGGAGAATTTTTCTATTGCAATACATCAAGCCTGTTTAATAGTGAA",
        "TTAGACAGTAATGGTACATTCAAAATTAATGGGACAGAAAATGGAACTGGAACAGAAAATTCAAA",
        "CATCACACTCCAATGCAGAATAAAACAAATTATAAACATGTGGCAGGAGGTAGGACGAGCAATGT",
        "ATGCCCCTCCCATTGCAGGAGAAATAACATGTAGATCAAATATCACAGGATTACTACTAACAAGG",
        "GATGGAGGAGACACGAGTGACGAGATATTCAGGCCTGGAGGAGGAGATATGAGGGACAATTGGAG"])
    aligner = Aligner()
    aligner.gap_open_penalty = GAP_OPEN_COST
    print(V3LOOP_REF in seed_ref)
    _, _, score = aligner.align(V3LOOP_REF, seed_ref)
    print(score)
    seed_ref = seed_ref[450:]
    print(V3LOOP_REF in seed_ref)
    try:
        _, _, score = aligner.align(V3LOOP_REF, seed_ref)
        print(score)
    except RuntimeError as ex:
        print(ex.message)

main()

Results:

True
-495
True
Traceback failed, try local alignment

This reference is part of HIV1-C-BR-JX140663 that we've been using to map reads for our G2P algorithm. Related to this: the other Gotoh library we use doesn't seem to penalize big gaps at the end. I don't know if that's a problem or not.

ArtPoon commented 7 years ago

Hi @donkirkby, Thanks so much for reporting this bug and providing a minimal case to reproduce. I ran your script on the current master branch and could not reproduce the bug, and I think it is related to #6:

is2882:tests art$ python issue12.py 
True
-495
True
-45

Could you please pull and try again?

donkirkby commented 7 years ago

Yes, it works for me now. Closing the issue.