Closed nextgenusfs closed 7 years ago
I tried adding the task="path" and then it works the way I would have expected.
>>> revalign = edlib.align('GCATATCAATAAGCGGAGGA', 'ATACCCCCCTATCTTAATCATATCAATACGCGGAGGAGTATCGGAAGCGCACCAGG', mode="HW", task="path")
>>> revalign
{'editDistance': 2, 'cigar': u'1X10=1X8=', 'locations': [(17, 36)], 'alphabetLength': 4}
Is there a speed cost associated with using task="path"
?
Answered my own question in the help menu, thanks!
align(...)
Align query with target using edit distance.
@param {string} query
@param {string} target
@param {string} mode Optional. Alignment method do be used. Possible values are:
- 'NW' for global (default)
- 'HW' for infix
- 'SHW' for prefix.
@param {string} task Optional. Tells edlib what to calculate. Less there is to calculate,
faster it is. Possible value are (from fastest to slowest):
- 'distance' - find edit distance and end locations in target. Default.
- 'locations' - find edit distance, end locations and start locations.
- 'path' - find edit distance, start and end locations and alignment path.
@param {int} k Optional. Max edit distance to search for - the lower this value,
the faster is calculation. Set to -1 (default) to have no limit on edit distance.
@return Dictionary with following fields:
{int} editDistance -1 if it is larger than k.
{int} alphabetLength
{[(int, int)]} locations List of locations, in format [(start, end)].
{string} cigar Cigar is a standard format for alignment path.
Here we are using extended cigar format, which uses following symbols:
Match: '=', Insertion to target: 'I', Deletion from target: 'D', Mismatch: 'X'.
e.g. cigar of "5=1X1=1I" means "5 matches, 1 mismatch, 1 match, 1 insertion (to target)".
I am glad you managed to solve it on your own! Yes, cost associated with using task=path
is not trivial, as you have probably already noticed, but it also depends on the size of the input data. If query is small compared to target, using task=path
should have no impact on speed.
I'm just trying the Python API noticing that the start location is always None, is the the intended behavior?