Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

possible bugs in edlib (python version) #136

Closed bernardo1963 closed 4 years ago

bernardo1963 commented 4 years ago

Hi, I am not used to python, so the two possible bugs I mentioned below may had been caused by some silly error I did.

1) edlib.getNiceAlignment does not seem to be working. I did exactly the test mentioned in https://pypi.org/project/edlib/ and got an error message:

python

import edlib result = edlib.align("elephant", "telephone", task="path") ## users must use 'task="path"' niceAlign = edlib.getNiceAlignment(result, "elephant", "telephone") Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'getNiceAlignment' print(niceAlign['query_aligned']) # "-elephant" Traceback (most recent call last): File "", line 1, in NameError: name 'niceAlign' is not defined print(niceAlign['matched_aligned']) # "-|||||.|." Traceback (most recent call last): File "", line 1, in NameError: name 'niceAlign' is not defined print(niceAlign['target_aligned']) # "telephone" Traceback (most recent call last): File "", line 1, in NameError: name 'niceAlign' is not defined

2) I aligned two PacBio sequence reads (one of them is contained in the other ), using -m HW.
With the Windows version I can get the start location sof the alignments. edlib-aligner 5rc.fasta 4.fasta -p -f NICE -m HW
head -12 Query #0 (15660 residues): score = 2975 T: TAATAAT-TTTTAACAAAATGTTTAAAAAATTTCAAAAAACCTTTGTTTC (1739 - 1787)

Q: CGATAATATTTTAACAAAATGTTTAAAAA-TTTCAAAAAAC-TTTGTTTC (0 - 47)

My question is how can I get in the python version the correct locations of the start of the alignment (namely , 1739 and 0) .
When I issued print(result["locations"])
I got: [(0, 15854), (0, 15855), (0, 15856)] . How can I get the correct ones (1739 and 0) ?

I attached below the fasta files I used. Thanks, Bernardo

4.fasta.txt 5rc.fasta.txt

evanbiederstedt commented 4 years ago

Hi @bernardo1963, CC @Martinsos

Sorry for the delays, I just saw this.

RE: 1

I was involved with creating the PR here: https://github.com/Martinsos/edlib/pull/132

I don't see the same errors as you. Is it possible you need to re-install edlib with --upgrade, e.g.

python3 -m pip install edlib --upgrade

?

That may solve the issue...let us know.

Thanks, Evan

Martinsos commented 4 years ago

Hi @bernardo1963, sorry for such a late answer!

Thanks @evanbiederstedt for picking this up :).

As @evanbiederstedt said, first bug is looks like you have wrong version of edlib.

Second thing. [(0, 15854), (0, 15855), (0, 15856)] represents start and end locations in target for couple of optimal alignments, while printed alignment is printed for first of those. Btw, I believe in python you reversed query and target, that is why you get these numbers, because these would make sense if 5rc.fasta was target, while previously in C code you set it as query. So make sure you are providing same query and target when comparing output from C with these locations reported by Python. Now, I believe you should get the 1739 as first element of first pair. As for 0, that is always start for query.

I hope that helps! I will close this one for now, because I believe this should be enough, but please ask more if smth is not clear, and if we confirm there is actually a bug, we can reopen it.