althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
120 stars 12 forks source link

search_hmm results in segmentation fault #25

Closed snerligit closed 2 years ago

snerligit commented 2 years ago

Hi,

I have created a hmm file and stored on the disk. I read it later to see how I score a specific sequence using search_hmm. But that results in segmentation fault. See the code snippet below:

` with pyhmmer.plan7.HMMFile(targetfile) as hmmfilehandler:

            hmm = next(hmmfilehandler)

            seq = pyhmmer.easel.DigitalSequence(alphabet=hmm.alphabet, name=str.encode("seq1"), sequence=str.encode(sequence))

            pipeline = pyhmmer.plan7.Pipeline(hmm.alphabet)
            hits = pipeline.search_hmm(query=hmm, sequences=[seq])     # this line results in segmentation fault

            for hit in hits:
                print ("Score: ", hit.score, tag)`

Any help debugging this issue is appreciated. Thank you.

althonos commented 2 years ago

Hi @snerligit ,

You're trying to create a DigitalSequence but using a text sequence, so the encoding is wrong (it needs to be encoded with A=0, C=1, D=2, etc.; with your current code you're encoding it with ASCII, so A=65, C=67...).

To create a new DigitalSequence from a text sequence, first create a TextSequence, then call the digitize method as follow:

 seq = pyhmmer.easel.TextSequence(name=b"seq1", sequence=sequence.encode()).digitize(hmm.alphabet)
althonos commented 2 years ago

I've added a check in v0.6.3, so that the code you posted raises an exception when creating a DigitalSequence with invalid characters, instead of segfaulting later in the search pipeline.