cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

Strange alignment result #331

Open donkirkby opened 8 years ago

donkirkby commented 8 years ago

While trying to write a test for issue #299, I discovered strange behaviour in the Gotoh alignment algorithm that we're using. Here's a small script to reproduce the problem:

import gotoh

gap_open = 40
gap_extend = 10
use_terminal_gap_penalty = 1
reference = 'EFACDE'
query =     'SFACDS'
aligned_ref, aligned_query, score = gotoh.align_it_aa(
    reference,
    query,
    gap_open,
    gap_extend,
    use_terminal_gap_penalty)
print(aligned_ref)
print(aligned_query)

The results are:

-EFACDE
S-FACDS

If I replace the E's with H's, I get this:

HFACDH
SFACDS

I understand that some substitutions are more likely than others, but why would it behave differently at the start and the end? If gap_open + gap_extend <= 10, then it opens a gap at both ends for E, and if it's less than or equal to 5, then it opens a gap for H.

donkirkby commented 7 years ago

@ArtPoon, I think you were working on a new alignment library that might replace Gotoh. Is that still happening, or should I dig into the Gotoh algorithm we have?

ArtPoon commented 7 years ago

https://github.com/ArtPoon/gotoh2

I never got around to finishing it but a lot of the implementation is there.

Best, Art

Sent from my iPhone

On Dec 14, 2016, at 6:38 PM, Don Kirkby notifications@github.com wrote:

@ArtPoon, I think you were working on a new alignment library that might replace Gotoh. Is that still happening, or should I dig into the Gotoh algorithm we have?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.