Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Use smart pointers #84

Open Martinsos opened 7 years ago

Martinsos commented 7 years ago

I should consider using smart pointers, as to avoid handling all the memory myself.

GDCCP commented 2 years ago

First of all, thanks for your work.

I have removed some explicit memory management in edlib.cpp, but I do wonder if there is a real concern about using size_t as length instead of int. I also read you may want to move closer to C++. So, I hope a few trivial changes will help you to move this forward. Here is my version:

src file: edlib.cpp.txt diff output: edlib.diff-w.txt

All 3 tests are good on 2 platforms (64-bit ubuntu and cygwin).

I started checking out the source because I could not install the perl binding from CPAN. I now think it was based on an older version without the custom equality feature.

GDCCP commented 2 years ago

edlibAlign calls myserCalcEditDistanceSemiGlobal(edlib.cpp:L199-212) inside a do loop:

    do {
        if (config.mode == EDLIB_MODE_HW || config.mode == EDLIB_MODE_SHW) {
            myersCalcEditDistanceSemiGlobal(Peq, W, maxNumBlocks,
                                            queryLength, target, targetLength,
                                            k, config.mode, &(result.editDistance),
                                            &(result.endLocations), &(result.numLocations));
        } else {  // mode == EDLIB_MODE_NW
            myersCalcEditDistanceNW(Peq, W, maxNumBlocks,
                                    queryLength, target, targetLength,
                                    k, &(result.editDistance), &positionNW,
                                    false, &alignData, -1);
        }
        k *= 2;
    } while(dynamicK && result.editDistance == -1);

and myserCalcEditDistanceSemiGlobal first reset positions_ to NULL without freeing any previously allocated memory:

static int myersCalcEditDistanceSemiGlobal(
        const Word* const Peq, const int W, const int maxNumBlocks,
        const int queryLength,
        const unsigned char* const target, const int targetLength,
        int k, const EdlibAlignMode mode,
        int* const bestScore_, int** const positions_, int* const numPositions_) {
    *positions_ = NULL;
    *numPositions_ = 0;

If it is impossible for that call to be executed more than once to cause memory leak, it may be better to put it outside the do loop.

At the momemt, I don't know enough to be sure there is a bug, but I will look closer.

Martinsos commented 2 years ago

edlibAlign calls myserCalcEditDistanceSemiGlobal(edlib.cpp:L199-212) inside a do loop:

    do {
        if (config.mode == EDLIB_MODE_HW || config.mode == EDLIB_MODE_SHW) {
            myersCalcEditDistanceSemiGlobal(Peq, W, maxNumBlocks,
                                            queryLength, target, targetLength,
                                            k, config.mode, &(result.editDistance),
                                            &(result.endLocations), &(result.numLocations));
        } else {  // mode == EDLIB_MODE_NW
            myersCalcEditDistanceNW(Peq, W, maxNumBlocks,
                                    queryLength, target, targetLength,
                                    k, &(result.editDistance), &positionNW,
                                    false, &alignData, -1);
        }
        k *= 2;
    } while(dynamicK && result.editDistance == -1);

and myserCalcEditDistanceSemiGlobal first reset positions_ to NULL without freeing any previously allocated memory:

static int myersCalcEditDistanceSemiGlobal(
        const Word* const Peq, const int W, const int maxNumBlocks,
        const int queryLength,
        const unsigned char* const target, const int targetLength,
        int k, const EdlibAlignMode mode,
        int* const bestScore_, int** const positions_, int* const numPositions_) {
    *positions_ = NULL;
    *numPositions_ = 0;

If it is impossible for that call to be executed more than once to cause memory leak, it may be better to put it outside the do loop.

At the momemt, I don't know enough to be sure there is a bug, but I will look closer.

I think you are right @GDCCP , this has got to be a bug / memory leak!

So if myersCalcEditDistanceSemiGlobal or myersCalcEditDistanceNW are called for the non-first time inside that loop, we should free that memory.

Could you please create a separate issue for this and label it as a bug (if you can't label it I will)?

As for the rest of the stuff -> let's continue the conversation in the PR!