Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Aligment for the most simple sequence does not find an aligment #124

Closed Fohlen closed 5 years ago

Fohlen commented 5 years ago

I am trying to align these two sequences

TGACCTTTT
TAACCTTTC

with the following config

EdlibAlignConfig config = edlibNewAlignConfig(-1, EDLIB_MODE_NW, EDLIB_TASK_PATH, nullptr, 0);

I was initially confused by the k parameter and set it to 0. This however yielded no result. Using the -1 will return no usable aligment. All it currently does is returning the target sequence. However here we should have two equally good aligments according to NW. Either the documentation is confusing or something is deliberately wrong.

Martinsos commented 5 years ago

Hi @Fohlen, it does indeed seem that you did not understand the parameter k and/or the rest of the documentation. What do you mean by "deliberately wrong", as if edlib was on purpose made so it returns wrong results, seriously?

If you want to try to understand the documentation better, check out header file or README. Then, if you have specific suggestions for improvements, please let me know or create a PR.

In the case you provided, NW edit distance is 2. Parameter k provides speed up to edlib by telling it "Hey, I want you to find edit distance but only if it is <= k.". So setting k to zero and getting no result is exactly what should happen in this case.

Regarding running with k == -1, I ran the following code:

    EdlibAlignConfig config = edlibNewAlignConfig(-1, EDLIB_MODE_NW, EDLIB_TASK_PATH, NULL, 0);
    EdlibAlignResult result = edlibAlign("TGACCTTTT", 9, "TAACCTTTC", 9, config);
    if (result.status == EDLIB_STATUS_OK) {
        printf("%d\n", result.editDistance);
        printf("%d\n", result.alignmentLength);
        printf("%d\n", result.endLocations[0]);
    }
    char* cigar = edlibAlignmentToCigar(result.alignment, result.alignmentLength,
                                        EDLIB_CIGAR_EXTENDED);
    printf("%s", cigar);
    free(cigar);
    edlibFreeAlignResult(result);

Result was as expected:

2
9
8
1=1X6=1X

2 is edit distance, alignment spanned the whole target so it is equal to its size, it ends at last location of target which is 8 (0-indexed) and alignment is 1=1X6=1X which means "1 match, then 1 mismatch, then 6 matches and then 1 mismatch". While there normally might be many alignment paths, edlib always returns one alignment path, as is obvious from the type of EdlibAlignResult.

I don't know what code exactly have you run and how have you inspected the result. If you can post the whole code, I could probably point to what are you doing wrong.