Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

seemingly random segmentation fault #85

Closed nextgenusfs closed 6 years ago

nextgenusfs commented 7 years ago

Hi @Martinsos.

I've run into something strange. I'm getting some seq fault problems with edlib, but they are not entirely reproducible. I'm using edlib from python (installed with pip, edlib: 1.1.2.post2)).

Seq = 'CACACCGCCCGTCGCTACTACCGATTGAATGGCTCAGTGAGGCCTTGGGATTGGCCAGGGGAGGTGGGCGACCACCACCCCAGGCCGAAAACTTGGTCAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTTCGCAGACGCCTAGTCTCGAGATAAGTCCTGATGCATCCTTCTGCAGTGAACACTTATCGGAAGCCTTTAACAGCTGCCGGAAACGGTGAGGCTGCCACGACTGTAAATAAGGGCAGCCCAAGAGCTAGTGGAACGCCGGTCCCTTGCCGGTCTTTCGCGACACTGTCAAATTGCGGGAATCCCCTTAGAGCTTGCTGCTACCAAGCGTCGTCCCGAAACGGGCGACGTGGCCAGGGTAACTCCCTCGGGTACGGTCACAACGCGCAAGATTGGGTAACCTGCAGCCAAGTCCTACCGGCCTTCCAGGCCCATGGATGCTGTTCACAGACTAAATGGTAGTGGGTGACGCCCCCCTCGGTAGAAGAGGGGTAGGAGTCGCTTAAGATATAGTCGGGCCCCCGGGGAGACTCGGGGGAAAAGTTTCACGAAGAATGGCCGTCGGTCCTTTTCGACGTGCCTAACAATAACCGTTCCGTAGGTGAACCTGCGGAAGGATCATTATCGAGTTCGGGTGTGCCGTGACACGCCGCCCAACCTCCCAACCCTCTGTTTATCACACCTTCGTTGCTTCGGTGGGTCGGTCGTGACCAACTGGTCTCCGACCGCCGGCCCCTCCACGGGCTGGAGAGTTGCCCACCGATGGCCCCCCACAACACTCTTATACCGAAACCTGTCGTCTAAGCGTGATTATGAATCAAAAATTAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGCGAATTGCAGAATTTCCGTGAGTCATCGAATCTTTGAACGCACATTGCGCCCATTGGTATTCCGATGGGCATGCCTGTTCGAGCGTCATTATCCCTCTCAAACCTCGGGTTTGGTGTTGGACTCATGTCGGTCCCTCGCTCCGGCGCGGTGACCAACTGGTCTCAAAGACAATGACGGCGTCCGTGGGACGCTCTTCGCAACGAGCTTTCCAAGCACGCGTCGAGTTTGTCAAGGACCCTCGGAGCCGGTCTACCTGTCGTGGCGTTTCGGCGTCTCGTTCTCTCAAGGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACCAACTGGGATTGCCTCAGTAACGGCGAGTGAAGCGGCAACAGCTCAAATTTGAAATCTGGCCCTCGCGGGTCCGAGTTGTAATTTGGAGAGGATGTTTTGGGCACCGCGGCGGTGTAAATTTCTTGGAACAGAATGTCAGAGAGGGTGAGAATCCCGTCTTGGACCGCCGTAGGACCCGTGTAAAACTCCTTCGACGAGTCGACTTGTTTGGGAATGCAGGTCAAAATGGGTGGTAAATTTCATCTAAAGCTAAATATTGGCCAGAGACCGATAGCGCACAAGTAGAGTGATCGAAAGATGAAAAGCACTTTGAAAAGAGAGTTAAACAGTATGTGAAATTGTTGAAAGGGAAGCGCTGGCAACCAGACTCGTACGCGGGGTTCCCCCTTGCTTCTGCTTGGGTTACTTCCCCGCGTCCGGGCCATCATCAGTTTTGGGGGCCGGTCAAAGGCCCCGGGAAAGTATCCTCTCTCTCGGGGGAGGACTTATAGCCCGGGGTGTCATGCGGCCTCCCGGGACTGAGGAACGCGCTTCGGCGAGGATGATGGCGTAATGGTTGTCAGCGACCCGTCTTGAAACACGGACCAAGGAGTCTAACATCTATGCGAGTGTTCGGGTGTCAAACCCTGGCGCGGAATGAAAGTGAACGGAGGTAGGAAGGCTTGAGCCTGCACTATCGACCGATCCTGATGTCTTCGGATGGATTTGAGTAAGAGCATAGCTGTTGGGACCCGAAAGATGGTGAACTATGCGTGAATAGGGTGAAGCCAGAGGAAACTCTGGTGGAGGCTCGCAGCGGTTCTGACGTGCAAATCGATCGTCAAATTTGCGTATAGGGGCGAAAGACTAATCGAA'
>>> edlib.align('GCATATCAATAAGCGGAGGA', str(Seq), task='locations', mode="HW", k=4)
{'editDistance': 0, 'cigar': None, 'locations': [(1240, 1259)], 'alphabetLength': 4}
>>> edlib.align('GCATATCAATAAGCGGAGGA', str(Seq), task='locations', mode="HW", k=4)
{'editDistance': 0, 'cigar': None, 'locations': [(1240, 1259)], 'alphabetLength': 4}
>>> edlib.align('GCATATCAATAAGCGGAGGA', str(Seq), task='locations', mode="HW", k=4)
Segmentation fault: 11

As you see here, I ran the same command 3 times, first two times it ran successfully, and then on the third try it seg faulted. Here's my system info if that is helpful at all?

Mac Sierra 10.12.5
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:43:17) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)] on darwin

Is this a compilation issue? Is there a way to recompile locally?

Martinsos commented 6 years ago

Thanks @nextgenusfs! I doubt it is compliation issue, since edlib pyton package is compiled locally when installed. I will try to replicate the problem and see what is causing it!

Martinsos commented 6 years ago

Unfortunately I am not able to replicate the problem, but I am on Linux. Maybe you could try debugging it? Instructions for building c/c++ are in README.md, while instructions for building/developing python package are in README.rst. One thing you could do is try using edlib with same input data but using it from C++, and see if seg faults also happens in that case -> then we know problem is not in python package but in the C/C++ library.

nextgenusfs commented 6 years ago

Hi @Martinsos. Okay, so I'm still struggling with this and I can't figure it out. I'm using the python bindings from edlib in a function that is looking for a forward and reverse primer, some sequences are causing a segmentation fault and I don't really understand why.

I also have never written anything in C++ before, but I did try to see if it was a python bindings problem versus an edlib C++ problem. I undoubtedly had problems with the C++, I kept getting a seqfault if I tried to use any other number than -1 as the k in the edlibNewAlignConfig.

Python problem, where seems to be somewhat randomly seq faulting, i.e. sometimes it runs to completion and sometimes not with the same command:

jon@Jons-MacBook-Pro:~/amptk$ python
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:43:17) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import edlib
>>> degenNucSimple = [("R", "A"), ("R", "G"), 
...             ("M", "A"), ("M", "C"),
...             ("W", "A"), ("W", "T"),
...             ("S", "C"), ("S", "G"),
...             ("Y", "C"), ("Y", "T"),
...             ("K", "G"), ("K", "T"),
...             ("V", "A"), ("V", "C"), ("V", "G"),
...             ("H", "A"), ("H", "C"), ("H", "T"),
...             ("D", "A"), ("D", "G"), ("D", "T"),
...             ("B", "C"), ("B", "G"), ("B", "T")]
>>> primer = 'GCATATCAATAAGCGGAGGA'
>>> Seq = 'TAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATCGAATAAACTTGATGGGTTGTCGCTGGCTTCTAGGAGCATGTGCACATCCGTCATTTTTATCCATCCACCTGTGCACCTTTTGTAGTCTTTGGAGGTAATAAGCGTGAATCTATCGAGGTCCTCTGGTCCTCGGAAAGAGGTGTTTGCCATATGGCTCGCCTTTGATACTCGCGAGTTACTCTAAGACTATGTCCTTTCATATACTACGAATGTAATAGAATGTATTCATTGGGCCTCAGTGCCTATAAAACATATACAACTTTCAGCAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACCCCTTCCGGTTTTTTGACTGGCTTTGGGGCTTGGATGTGGGGGATTCATTTGCGGGCCTCTGTAGAGGTCGGCTCCCCTGAAATGCATTAGTGGAACCGTTTGCGGTTACCGTCGCTGGTGTGATAACTATCTATGCCAAAGACAAACTGCTCTCTGATAGTTCTGCTTCTAACCGTCCATTTATTGGACAACATTATTATGAACACTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCGGTCTTTGGCCGTCCGAGTTGTAATCTAGAGAAGCGACACCCGCGCTGGACCGTGTACAAGTCTCCTGGAATGGAGCGTCATAGAGGGTGAGAATCCCGTCTCTGACACGGACTACCAGGGCTTTGTGGTGCGCTCTCAAAGAGTCGAGTTGTTTGGGAATGCAGCTCTAAATGGGTGGTAAATTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGATGAAAAGAACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGCTGAAAGGGAAACGCTTGAAGTCAGTCGCGTTGGCCGGGGATCAGCCTCGCTTTTGCGTGGTGTATTTCCTGGTTGACGGGTCAGCATCAATTTTGACCGCTGGAAAAGGACTTGGGGAATGTGGCATCTTCGGATGTGTTATAGCCCTTTGTCGCATACGGCGGTTGGGATTGAGGAACTCAGCACGCCGCAAGGCCGGGTTTCGACCACGTTCGTGCTTAGGATGCTGGCATAATGGCTTTAATCGACCCGTCTTGAAACACGGACCAAGGAGTCTAACATGCCTGCGAGTGTTTGGGTGGAAAACCCGAGCGCGTAATGAAAGTGAAAGTTGAGATCCCTGTCGTGGGGAGCATCGACGCCCGGACCAGAACTTTTGGGACGGATCTGCGGTAGAGCATGTATGTTGGGACCCGAAAGATGGTGAACTATGCCTGAATAGGGTGAAGCCAGAGGAAACTCTGGTGGAGGCTCGTAGCGATTCTGACGTGCAAATCGATCGTCAAATTTGGGTATAGGGGCGAAAGACTAATCGAACCATCTAGTAGCTGGTTCCTGCCGAAGTTTCCCTCAGGATAGCAGAAACTCATATCAGATTTATGTGGTAAAGCGAATGATTAGAGGCCTTGGGGTTGAAACAACCTTAACCTATTCTCAAACTTTAAATATGTAAGAACGAGCCGTTTCTTGATTGAACCGCTCGGCGATTGAGAGTTTCTAGTGGGCCATTTTTGGTAAGCAGAACTGGCGATGCGGGATGAACCGAACGCGAGGTTAAGGTGCCGGAATTCACGCTCATCAGACACCACAAAAGGTGTTAGTTCATCTAGACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATGAACTAGCCCTGAAAATGGATGGCGCTTAAGCGTGATACCCATACCTCGCCGTCAGCGTTGAAGTGACGCGCTGACGAGTAGGCAGGCGTGGAGGTCAGTGAAGAAGCCTTGGCAGTGATGCTGGGTGAAACGGCCTCC'
>>> edlib.align(primer, Seq, task='locations', mode="HW", k=2, additionalEqualities=degenNucSimple)
Segmentation fault: 11

I then tried to write something simple in C++ doing the same thing

#include <cstdio>
#include <iostream>
#include <string>
#include <cstring>
#include "edlib.h"

int main(int argc, char* argv[])
{
    if ( argc != 3 ) {// argc should be 2
        std::cout<<"usage: "<< argv[0] <<" Primer Sequence\n";}
    else {
        // degenerate nucleotide matches
        EdlibEqualityPair additionalEqualities[24] = {{'R','A'},{'R','G'},{'M','A'},{'M','C'},{'W','A'},{'W','T'},{'S','C'},{'S','G'},{'Y','C'},{'Y','T'},{'K','G'},{'K','T'},{'V','A'},{'V','C'},{'V','G'},{'H','A'},{'H','C'},{'H','T'},{'D','A'},{'D','G'},{'D','T'},{'B','C'},{'B','G'},{'B','T'}};
        // argv[4] is mismatches allowed
        //int mismatch = 0;
        //mismatch= std::atoi(argv[4]);
        // return {'editDistance': -1, 'cigar': None, 'locations': [], 'alphabetLength': 5}                                            
        EdlibAlignResult result = edlibAlign(argv[1], std::strlen(argv[1]), argv[2], std::strlen(argv[2]), 
            edlibNewAlignConfig(-1, EDLIB_MODE_HW, EDLIB_TASK_LOC, additionalEqualities, 24));
        if (result.status == EDLIB_STATUS_OK) {
            printf("{'editDistance': %d, ", result.editDistance);
            printf("'cigar': None, 'locations': [%d", result.startLocations[0]);
            printf(",%d],", result.endLocations[0]);
            printf(" 'alphabetLength': %d}\n", result.alphabetLength);
        }
        edlibFreeAlignResult(result);       
    }
}     

I then compiled with g++-7, like: g++-7 helloworld.cpp edlib/src/edlib.cpp -o findPrimer -I edlib/include. I then also get a seqfault with this particular sequence.

jon@Jons-MacBook-Pro:~/amptk$ findPrimer GCATATCAATAAGCGGAGGA TAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATCGAATAAACTTGATGGGTTGTCGCTGGCTTCTAGGAGCATGTGCACATCCGTCATTTTTATCCATCCACCTGTGCACCTTTTGTAGTCTTTGGAGGTAATAAGCGTGAATCTATCGAGGTCCTCTGGTCCTCGGAAAGAGGTGTTTGCCATATGGCTCGCCTTTGATACTCGCGAGTTACTCTAAGACTATGTCCTTTCATATACTACGAATGTAATAGAATGTATTCATTGGGCCTCAGTGCCTATAAAACATATACAACTTTCAGCAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACCCCTTCCGGTTTTTTGACTGGCTTTGGGGCTTGGATGTGGGGGATTCATTTGCGGGCCTCTGTAGAGGTCGGCTCCCCTGAAATGCATTAGTGGAACCGTTTGCGGTTACCGTCGCTGGTGTGATAACTATCTATGCCAAAGACAAACTGCTCTCTGATAGTTCTGCTTCTAACCGTCCATTTATTGGACAACATTATTATGAACACTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCGGTCTTTGGCCGTCCGAGTTGTAATCTAGAGAAGCGACACCCGCGCTGGACCGTGTACAAGTCTCCTGGAATGGAGCGTCATAGAGGGTGAGAATCCCGTCTCTGACACGGACTACCAGGGCTTTGTGGTGCGCTCTCAAAGAGTCGAGTTGTTTGGGAATGCAGCTCTAAATGGGTGGTAAATTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGATGAAAAGAACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGCTGAAAGGGAAACGCTTGAAGTCAGTCGCGTTGGCCGGGGATCAGCCTCGCTTTTGCGTGGTGTATTTCCTGGTTGACGGGTCAGCATCAATTTTGACCGCTGGAAAAGGACTTGGGGAATGTGGCATCTTCGGATGTGTTATAGCCCTTTGTCGCATACGGCGGTTGGGATTGAGGAACTCAGCACGCCGCAAGGCCGGGTTTCGACCACGTTCGTGCTTAGGATGCTGGCATAATGGCTTTAATCGACCCGTCTTGAAACACGGACCAAGGAGTCTAACATGCCTGCGAGTGTTTGGGTGGAAAACCCGAGCGCGTAATGAAAGTGAAAGTTGAGATCCCTGTCGTGGGGAGCATCGACGCCCGGACCAGAACTTTTGGGACGGATCTGCGGTAGAGCATGTATGTTGGGACCCGAAAGATGGTGAACTATGCCTGAATAGGGTGAAGCCAGAGGAAACTCTGGTGGAGGCTCGTAGCGATTCTGACGTGCAAATCGATCGTCAAATTTGGGTATAGGGGCGAAAGACTAATCGAACCATCTAGTAGCTGGTTCCTGCCGAAGTTTCCCTCAGGATAGCAGAAACTCATATCAGATTTATGTGGTAAAGCGAATGATTAGAGGCCTTGGGGTTGAAACAACCTTAACCTATTCTCAAACTTTAAATATGTAAGAACGAGCCGTTTCTTGATTGAACCGCTCGGCGATTGAGAGTTTCTAGTGGGCCATTTTTGGTAAGCAGAACTGGCGATGCGGGATGAACCGAACGCGAGGTTAAGGTGCCGGAATTCACGCTCATCAGACACCACAAAAGGTGTTAGTTCATCTAGACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATGAACTAGCCCTGAAAATGGATGGCGCTTAAGCGTGATACCCATACCTCGCCGTCAGCGTTGAAGTGACGCGCTGACGAGTAGGCAGGCGTGGAGGTCAGTGAAGAAGCCTTGGCAGTGATGCTGGGTGAAACGGCCTCC
Segmentation fault: 11

But I don't get SeqFault with all sequences

jon@Jons-MacBook-Pro:~/amptk$ findPrimer GACCT TTTTTTTTGACCTAAAAAAAAA
{'editDistance': 0, 'cigar': None, 'locations': [8,12], 'alphabetLength': 4}
Martinsos commented 6 years ago

Awesome, thank you for putting all this effort in it :). I will start from what you wrote, try to replicate it in C++ and then see what has to be fixed. I should catch some time this weekend, so I will probably get to it then. Thanks again!

Martinsos commented 6 years ago

I tried both python and C++ cases that you described on my machine, and I can not get segmentation fault hm. I will try to get hold of Mac and run it there.

nextgenusfs commented 6 years ago

Well that is somewhat discouraging that it seems to be Mac only, is there some way I can trouble shoot the seg fault error in more depth in order to help you?

nextgenusfs commented 6 years ago

I tried to use gdb, perhaps this is helpful?

jon@Jons-MacBook-Pro:~/amptk$ gdb
GNU gdb (GDB) 8.0
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.6.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file
No executable file now.
No symbol file now.
(gdb) file findPrimer
Reading symbols from findPrimer...Reading symbols from /Users/jon/amptk/findPrimer.dSYM/Contents/Resources/DWARF/findPrimer...done.
done.
(gdb) run GCATATCAATAAGCGGAGGA TAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATCGAATAAACTTGATGGGTTGTCGCTGGCTTCTAGGAGCATGTGCACATCCGTCATTTTTATCCATCCACCTGTGCACCTTTTGTAGTCTTTGGAGGTAATAAGCGTGAATCTATCGAGGTCCTCTGGTCCTCGGAAAGAGGTGTTTGCCATATGGCTCGCCTTTGATACTCGCGAGTTACTCTAAGACTATGTCCTTTCATATACTACGAATGTAATAGAATGTATTCATTGGGCCTCAGTGCCTATAAAACATATACAACTTTCAGCAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACCCCTTCCGGTTTTTTGACTGGCTTTGGGGCTTGGATGTGGGGGATTCATTTGCGGGCCTCTGTAGAGGTCGGCTCCCCTGAAATGCATTAGTGGAACCGTTTGCGGTTACCGTCGCTGGTGTGATAACTATCTATGCCAAAGACAAACTGCTCTCTGATAGTTCTGCTTCTAACCGTCCATTTATTGGACAACATTATTATGAACACTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCGGTCTTTGGCCGTCCGAGTTGTAATCTAGAGAAGCGACACCCGCGCTGGACCGTGTACAAGTCTCCTGGAATGGAGCGTCATAGAGGGTGAGAATCCCGTCTCTGACACGGACTACCAGGGCTTTGTGGTGCGCTCTCAAAGAGTCGAGTTGTTTGGGAATGCAGCTCTAAATGGGTGGTAAATTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGATGAAAAGAACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGCTGAAAGGGAAACGCTTGAAGTCAGTCGCGTTGGCCGGGGATCAGCCTCGCTTTTGCGTGGTGTATTTCCTGGTTGACGGGTCAGCATCAATTTTGACCGCTGGAAAAGGACTTGGGGAATGTGGCATCTTCGGATGTGTTATAGCCCTTTGTCGCATACGGCGGTTGGGATTGAGGAACTCAGCACGCCGCAAGGCCGGGTTTCGACCACGTTCGTGCTTAGGATGCTGGCATAATGGCTTTAATCGACCCGTCTTGAAACACGGACCAAGGAGTCTAACATGCCTGCGAGTGTTTGGGTGGAAAACCCGAGCGCGTAATGAAAGTGAAAGTTGAGATCCCTGTCGTGGGGAGCATCGACGCCCGGACCAGAACTTTTGGGACGGATCTGCGGTAGAGCATGTATGTTGGGACCCGAAAGATGGTGAACTATGCCTGAATAGGGTGAAGCCAGAGGAAACTCTGGTGGAGGCTCGTAGCGATTCTGACGTGCAAATCGATCGTCAAATTTGGGTATAGGGGCGAAAGACTAATCGAACCATCTAGTAGCTGGTTCCTGCCGAAGTTTCCCTCAGGATAGCAGAAACTCATATCAGATTTATGTGGTAAAGCGAATGATTAGAGGCCTTGGGGTTGAAACAACCTTAACCTATTCTCAAACTTTAAATATGTAAGAACGAGCCGTTTCTTGATTGAACCGCTCGGCGATTGAGAGTTTCTAGTGGGCCATTTTTGGTAAGCAGAACTGGCGATGCGGGATGAACCGAACGCGAGGTTAAGGTGCCGGAATTCACGCTCATCAGACACCACAAAAGGTGTTAGTTCATCTAGACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATGAACTAGCCCTGAAAATGGATGGCGCTTAAGCGTGATACCCATACCTCGCCGTCAGCGTTGAAGTGACGCGCTGACGAGTAGGCAGGCGTGGAGGTCAGTGAAGAAGCCTTGGCAGTGATGCTGGGTGAAACGGCCTCC
Starting program: /Users/jon/amptk/findPrimer GCATATCAATAAGCGGAGGA TAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATCGAATAAACTTGATGGGTTGTCGCTGGCTTCTAGGAGCATGTGCACATCCGTCATTTTTATCCATCCACCTGTGCACCTTTTGTAGTCTTTGGAGGTAATAAGCGTGAATCTATCGAGGTCCTCTGGTCCTCGGAAAGAGGTGTTTGCCATATGGCTCGCCTTTGATACTCGCGAGTTACTCTAAGACTATGTCCTTTCATATACTACGAATGTAATAGAATGTATTCATTGGGCCTCAGTGCCTATAAAACATATACAACTTTCAGCAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACCCCTTCCGGTTTTTTGACTGGCTTTGGGGCTTGGATGTGGGGGATTCATTTGCGGGCCTCTGTAGAGGTCGGCTCCCCTGAAATGCATTAGTGGAACCGTTTGCGGTTACCGTCGCTGGTGTGATAACTATCTATGCCAAAGACAAACTGCTCTCTGATAGTTCTGCTTCTAACCGTCCATTTATTGGACAACATTATTATGAACACTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCGGTCTTTGGCCGTCCGAGTTGTAATCTAGAGAAGCGACACCCGCGCTGGACCGTGTACAAGTCTCCTGGAATGGAGCGTCATAGAGGGTGAGAATCCCGTCTCTGACACGGACTACCAGGGCTTTGTGGTGCGCTCTCAAAGAGTCGAGTTGTTTGGGAATGCAGCTCTAAATGGGTGGTAAATTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGATGAAAAGAACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGCTGAAAGGGAAACGCTTGAAGTCAGTCGCGTTGGCCGGGGATCAGCCTCGCTTTTGCGTGGTGTATTTCCTGGTTGACGGGTCAGCATCAATTTTGACCGCTGGAAAAGGACTTGGGGAATGTGGCATCTTCGGATGTGTTATAGCCCTTTGTCGCATACGGCGGTTGGGATTGAGGAACTCAGCACGCCGCAAGGCCGGGTTTCGACCACGTTCGTGCTTAGGATGCTGGCATAATGGCTTTAATCGACCCGTCTTGAAACACGGACCAAGGAGTCTAACATGCCTGCGAGTGTTTGGGTGGAAAACCCGAGCGCGTAATGAAAGTGAAAGTTGAGATCCCTGTCGTGGGGAGCATCGACGCCCGGACCAGAACTTTTGGGACGGATCTGCGGTAGAGCATGTATGTTGGGACCCGAAAGATGGTGAACTATGCCTGAATAGGGTGAAGCCAGAGGAAACTCTGGTGGAGGCTCGTAGCGATTCTGACGTGCAAATCGATCGTCAAATTTGGGTATAGGGGCGAAAGACTAATCGAACCATCTAGTAGCTGGTTCCTGCCGAAGTTTCCCTCAGGATAGCAGAAACTCATATCAGATTTATGTGGTAAAGCGAATGATTAGAGGCCTTGGGGTTGAAACAACCTTAACCTATTCTCAAACTTTAAATATGTAAGAACGAGCCGTTTCTTGATTGAACCGCTCGGCGATTGAGAGTTTCTAGTGGGCCATTTTTGGTAAGCAGAACTGGCGATGCGGGATGAACCGAACGCGAGGTTAAGGTGCCGGAATTCACGCTCATCAGACACCACAAAAGGTGTTAGTTCATCTAGACAGCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAACAACTCACCTGCCGAATGAACTAGCCCTGAAAATGGATGGCGCTTAAGCGTGATACCCATACCTCGCCGTCAGCGTTGAAGTGACGCGCTGACGAGTAGGCAGGCGTGGAGGTCAGTGAAGAAGCCTTGGCAGTGATGCTGGGTGAAACGGCCTCC
[New Thread 0x1403 of process 7193]
warning: unhandled dyld version (15)

Thread 2 received signal SIGSEGV, Segmentation fault.
0x0000000100002120 in edlibAlign (queryOriginal=0x7fff5fbfee8c "GCATATCAATAAGCGGAGGA", queryLength=20, 
    targetOriginal=0x7fff5fbfeea1 "TAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATCGAATAAACTTGATGGGTTGTCGCTGGCTTCTAGGAGCATGTGCACATCCGTCATTTTTATCCATCCACCTGTGCACCTTTTGTAGTCTTTGGAGGTAATAAGCGTGAATCTATCGAGGTCCTCTGGTCCTCGGAAAGAGGTGTTTGCCATATGGCTCG"..., targetLength=2123, config=...) at edlib/src/edlib.cpp:220
220                     result.startLocations[i] = endLocation - positionsSHW[numPositionsSHW - 1];
(gdb) bt
#0  0x0000000100002120 in edlibAlign (queryOriginal=0x7fff5fbfee8c "GCATATCAATAAGCGGAGGA", queryLength=20, 
    targetOriginal=0x7fff5fbfeea1 "TAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATCGAATAAACTTGATGGGTTGTCGCTGGCTTCTAGGAGCATGTGCACATCCGTCATTTTTATCCATCCACCTGTGCACCTTTTGTAGTCTTTGGAGGTAATAAGCGTGAATCTATCGAGGTCCTCTGGTCCTCGGAAAGAGGTGTTTGCCATATGGCTCG"..., targetLength=2123, config=...) at edlib/src/edlib.cpp:220
#1  0x0000000100001804 in main (argc=3, argv=0x7fff5fbfeca8) at helloworld.cpp:19
(gdb) 

Looks like line 220 in edlib.cpp is causing the error?

If I run one that does not segfault, here is what it looks like:

Starting program: /Users/jon/amptk/findPrimer GATTTTAAAA AAAAAATTTTTTGATTTAAACCCCCCCCCCCCC
[New Thread 0x1703 of process 7199]
warning: unhandled dyld version (15)
{'editDistance': 2, 'cigar': None, 'locations': [12,19], 'alphabetLength': 4}
[Inferior 1 (process 7199) exited normally]
(gdb) bt
No stack.
(gdb) 
Martinsos commented 6 years ago

@nextgenusfs Fixed with c12b5e1159ccd7e42cbfd3fb21b43c354c2797f8!

There was a bug specifically with HW on every 2048th column in some cases, when edlib tries to reduce the band aggresivelly in order to speed up the calculation, and it was not manifesting on linux due to random default values for uninitialized memory, while OSX seems to set those to 0 in this case. I am glad we found this! Can't believe it was never caught in random tests hm.

Thanks a lot for reporting, it helped finding the bug -> feel free to try it and let me know if it works ok now (it should, I tested on OSX and it works now). I released a new version of C++ library and also published new version of python package, so both should be fixed now.

nextgenusfs commented 6 years ago

Fantastic, thank you!

Martinsos commented 6 years ago

No problem, and feel free to give a github star to edlib if you find it useful :D.