Closed evanbiederstedt closed 4 years ago
I'm actually not sure what cresult.alignment
is---I've tried to access it a few ways, and I get segmentation faults.
I get the same issue at @yech1990 in https://github.com/Martinsos/edlib/issues/127 on April 10th
I believe the conditional at if cresult.alignment:
is never hit, which is a problem (I think).
Note that I also just created more fields in the result of align()
. That appears to be a flexible solution.
@evanbiederstedt, it is nice to see that you are doing progress with this :)!
Since python library is working correctly for me, I am guessing the problem might be that you are not running it with correct input parameters: you need to set task
to "path" if you want edlib to return alignment/cigar. If it is not set to "path", ccigar and cresult.alignment will be empty.
Check the function comments for param task
and it should be clear.
Maybe we should also mention under comments for returned value that cigar will be returned only if task is set to "path".
Regarding the approach itself, while it is not bad, I am not a fan of putting more unnecessary logic into this main function. The way you did it, it is not optional to choose if NICE alignment should be constructed or not. I would highly prefer having separate function as I mentioned before, so that user can easily use if they want + it does not further complicate / slow down the main function.
Regarding the approach itself, while it is not bad, I am not a fan of putting more unnecessary logic into this main function. The way you did it, it is not optional to choose if NICE alignment should be constructed or not. I would highly prefer having separate function as I mentioned before, so that user can easily use if they want + it does not further complicate / slow down the main function.
This is a very good point. It's quite possible that my additions would slow down the method. Yes, I agree with this---I'll implement this as a separate function.
I am not sure what is it that user needs, if you think it is better to separate it upfront, we can go with that. In that case, I would explain that well in the comments and also return structure with nice property names for each of those 3 segments, not just an array/triple.
I think four fields make sense, i.e. four key-value pairs in a dictionary. I think NICE
, ref_aln
, match_aln
, query_aln
should work----naturally, I'm happy to revise the names.
Hi @Martinsos
Since python library is working correctly for me, I am guessing the problem might be that you are not running it with correct input parameters: you need to set task to "path" if you want edlib to return alignment/cigar. If it is not set to "path", ccigar and cresult.alignment will be empty. Check the function comments for param task and it should be clear.
This is 100% correct. The source code of the current PR works if task="path"
.
e.g.
>>> import edlib
>>> edlib.align(query="AAGGGGTCTCATATC", target="TAAGGATGGTCCCATTC", task="path")
## outputs
{'match_aln': ' |||| ||||.||| ||', 'ref_aln': 'TAAGGATGGTCCCAT TC',
'query_aln': ' AAGG GGTCTCATATC', 'editDistance': 5,
'cigar': u'1D4=2D4=1X3=1I2=', 'locations': [(0, 16)], 'alphabetLength': 4,
'NICE': 'TAAGGATGGTCCCAT TC\n |||| ||||.||| ||\n AAGG GGTCTCATATC'}
So, I'll revise this into a separate function cigarToNiceAlignment
, and pass it over for review. There may be issues with code duplication, etc.
Thanks, Evan
Hi @Martinsos
So, I revised the previous proposal, and implemented cigarToNiceAlignment()
.
(I'm not sure cigarToNiceAlignment()
is the best name, but I cannot think of another. Maybe niceAlign()
?)
Currently, thinking for potential use cases, I believe I would want the cigar, the NICE alignment, and each component of the NICE alignment.
I'm tempted to suggest we use -
instead of spaces to show how the two inputs align.
That is, instead of
TAAGGATGGTCCCAT TC
|||| ||||.||| ||
AAGG GGTCTCATATC
I wonder if users would prefer
TAAGGATGGTCCCAT-TC
|||| ||||.||| ||
-AAGG--GGTCTCATATC
as it's a bit more clear where the InDels are located.
I also changed target
to reference
, but I don't mind either term.
Please revise documentation/variable names as you see fit---we can then discuss updating the README. Thanks!
After thinking about it a bit, I do think that at least ref_aln
and query_aln
should use a different symbol in place of spaces.
I believe something like:
alignments = {
'cigar': cigar,
'NICE' : "{}\n{}\n{}".format(ref_aln, match_aln, query_aln), ## f-strings only available from Python 3.6+ sadly
'ref_aln' : ref_aln.replace(' ', '-'),
'match_aln': match_aln,
'query_aln': query_aln.replace(' ', '-')
}
would result in these strings having -
.
Let me know what you think :)
@evanbiederstedt I hope all is fine with you - are you still interested in finalizing this PR? I think we are close to done, would be a shame to let it drop!
Hey @Martinsos
This is on the to-do list! Apologies so much time has passed---I've been busier than expected.
Let's see if I can finish this in the next 48 hours. Thanks for checking in though! I am definitely not dropping it.
Thanks, Evan
Hey no problem, I know how it is! Take your time, just wanted to check if all is ok :).
On Thu, Jun 27, 2019 at 8:06 PM evanbiederstedt notifications@github.com wrote:
Hey @Martinsos https://github.com/Martinsos
This is on the to-do list! Apologies so much time has passed---I've been busier than expected.
Let's see if I can finish this in the next 48 hours. Thanks for checking in though! I am definitely not dropping it.
Thanks, Evan
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Martinsos/edlib/pull/132?email_source=notifications&email_token=AALXFBZTCIBQGHZEY2QV2RLP4T6TLA5CNFSM4HN5HJT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYX5PLA#issuecomment-506451884, or mute the thread https://github.com/notifications/unsubscribe-auth/AALXFB7XZEY33IUM73HRXW3P4T6TLANCNFSM4HN5HJTQ .
Hi @Martinsos
Sorry for the delay---it took far longer than 48 hours.
Please have a look at the recent commits.
There are a few things to discuss, and then we can finish this up:
(1) I think the function requires both the cigar and locations
from the output of align()
, right?
(2) I believe align()
only outputs a cigar if task="path"
in align()
. If the above outline makes sense, I can write a simple check for this.
(3) The description of parameters could be improved.
Let me know how you think we could proceed. Thanks!
Hi @Martinsos I think this is close to done. Do you have more recommendations for the finishing touches, so we can push this over the finish line?
Thanks!
Hi @Martinsos Please take another look. Feel free to make changes to this branch :)
However, here in python, you actually use the starting position (right?) which means we don't need to figure out the starting position, and therefore we don't need to know the mode, which is great.
I believe that is right, based on target_pos = alignResult["locations"][0][0]
, correct?
I also added a check here such that locations
exists.
The only thing I would ask of you is to confirm that everything works fine for HW by running an example and checking its output.
It looks like it works:
>>> import edlib
>>> resultNW = edlib.align(query="TAAGGATGGTCCCATTC", target="AAGGGGTCTCATATC", mode="NW", task="path")
>>> edlib.getNiceAlignmentFromAlignResult(resultNW, query="TAAGGATGGTCCCATTC", target="AAGGGGTCTCATATC")
{'query_aligned': 'TAAGGATGGTCCCAT-TC', 'matched_aligned': '-||||--||||.|||-||', 'target_aligned': '-AAGG--GGTCTCATATC'}
>>>
>>> resultHW = edlib.align(query="TAAGGATGGTCCCATTC", target="AAGGGGTCTCATATC", mode="HW", task="path")
>>> edlib.getNiceAlignmentFromAlignResult(resultHW, query="TAAGGATGGTCCCATTC", target="AAGGGGTCTCATATC")
{'query_aligned': 'TAAGGATGGTCCCAT-TC', 'matched_aligned': '-||||--||||.|||-||', 'target_aligned': '-AAGG--GGTCTCATATC'}
>>> resultSHW = edlib.align(query="TAAGGATGGTCCCATTC", target="AAGGGGTCTCATATC", mode="SHW", task="path")
>>>
>>> edlib.getNiceAlignmentFromAlignResult(resultSHW, query="TAAGGATGGTCCCATTC", target="AAGGGGTCTCATATC")
{'query_aligned': 'TAAGGATGGTCCCAT-TC', 'matched_aligned': '-||||--||||.|||-||', 'target_aligned': '-AAGG--GGTCTCATATC'}
I am wondering, does it make sense to add support for the case when query and target are not strings but bytes or iterable hashable objects? We could try to print those also I guess, but that probably does not make much sense. Ok, I am alright with going just with strings for now.
I'm tempted to just use strings for now; if others have requests, we/they can incorporate within this code via PR.
Ah yes, and one thing is left: while I don't have real tests set up, I do have "poor man's" tests in shape of a script that runs few methods and checks if they failed. So please, add an example or two there, that will test this new function:
Please add additions to the README; that is very important for the user :) I think you have a stronger notion of what should be here than me.
So please, add an example or two there, that will test this new function: https://github.com/Martinsos/edlib/blob/master/bindings/python/test.py .
Tests are now added :)
Hi @Martinsos I also added a few lines to the README
After typing this out, I do think getNiceAlignmentFromAlignResult()
is too long of method name. Perhaps getNiceAlignment()
would suffice. I don't feel strongly about this though.
Please look the changes over, and feel free to make any changes you see fit.
Thanks, Evan
Hi @Martinsos
However, please check the suggestions I made and consider approving them.
I've accepted them all!
Also, in README, line 39, it says "Edlib has only one function" -> that is not true anymore. It would be great if you could modify that so it says that there is one main function, and then there is utility function -> the one which you implemented.
Done!
Before merging, please squash all the commits you did into one commit.
Ok, I could do this.
Hi @Martinsos
Thanks for the help on this. Everything looks good from my end. I think you need to merge now, as I don't have permissions.
@evanbiederstedt awesome!! You are right, I forgot I have to merge -> done. Thanks for all the effort, and I am looking forward to future PRs :D!
@Martinsos I'm very happy to collaborate again! (Fork deleted.)
This is a (flawed) start, related to the following issue: https://github.com/Martinsos/edlib/issues/127 (and this duplicate: https://github.com/Martinsos/edlib/issues/129)
This uses @yech1990 's example to parse the extended cigar (which is just a compressed version of "alignment" from cpp edlib) from the Cython script
edlib.pyx
I didn't quite do this, but just revised the
align()
method.The problem:
As far as I can tell,
ccigar
is always empty.I've tried in the Cython code printing these variables:
It appears that
cresult.alignmentLength
is also empty.cedlib.EDLIB_CIGAR_EXTENDED
is some sort of integer?These are the issues I'm running into trying to finish this PR