After getting some segfaults with PyHMMER for unknown reasons I started a round of memory check with Valgrind, and found some invalid writes in the HMMER and Easel code. I'll try to address them one at a time if I can trace them back.
==268312== Invalid write of size 1
==268312== at 0x484C3DE: strcpy (vg_replace_strmem.c:561)
==268312== by 0x6B1CCBD: UnknownInlinedFun (string_fortified.h:79)
==268312== by 0x6B1CCBD: esl_sq_Copy (esl_sq.c:339)
==268312== by 0x6B4CDFF: sqascii_Read (esl_sqio_ascii.c:772)
==268312== by 0x6AD7A22: __pyx_f_7pyhmmer_5easel_12SequenceFile_readinto.lto_priv.0 (easel.c:85494)
==268312== by 0x6AD0BF4: __pyx_f_7pyhmmer_5easel_12SequenceFile_read.lto_priv.0 (easel.c:85169)
==268312== by 0x6AC5C43: UnknownInlinedFun (easel.c:84134)
==268312== by 0x6AC5C43: __pyx_pw_7pyhmmer_5easel_12SequenceFile_17__next__.lto_priv.0 (easel.c:84112)
==268312== by 0x4AB0D14: list_extend (listobject.c:966)
==268312== by 0x4AE4151: UnknownInlinedFun (listobject.c:2790)
==268312== by 0x4AE4151: list_vectorcall (listobject.c:2815)
==268312== by 0x4A7B236: UnknownInlinedFun (pycore_call.h:92)
==268312== by 0x4A7B236: PyObject_Vectorcall (call.c:299)
==268312== by 0x4A6D6E0: _PyEval_EvalFrameDefault (ceval.c:4769)
==268312== by 0x4AB3E9E: UnknownInlinedFun (pycore_ceval.h:73)
==268312== by 0x4AB3E9E: UnknownInlinedFun (ceval.c:6434)
==268312== by 0x4AB3E9E: UnknownInlinedFun (call.c:393)
==268312== by 0x4AB3E9E: UnknownInlinedFun (pycore_call.h:92)
==268312== by 0x4AB3E9E: method_vectorcall (classobject.c:89)
==268312== by 0x4A716A2: UnknownInlinedFun (ceval.c:7352)
==268312== by 0x4A716A2: _PyEval_EvalFrameDefault (ceval.c:5376)
Here I got some invalid writes in esl_sq_Copy while parsing a file. It turns out that the allocation of extra residue tags is using the nalloc even though the tags may contain free text. This will lead to underallocation in the case were an xr_tag is longer than the name stored in the sequence. This can be seen for instance by attempting to parse stockholm.good.1: there is a sequence named seq6 with markups Invented_tag and Another_tag.
To fix this, I changed the xr_tag allocation to use strlen and allocate with the true string length.
Hi!
After getting some segfaults with PyHMMER for unknown reasons I started a round of memory check with Valgrind, and found some invalid writes in the HMMER and Easel code. I'll try to address them one at a time if I can trace them back.
Here I got some invalid writes in
esl_sq_Copy
while parsing a file. It turns out that the allocation of extra residue tags is using thenalloc
even though the tags may contain free text. This will lead to underallocation in the case were anxr_tag
is longer than thename
stored in the sequence. This can be seen for instance by attempting to parsestockholm.good.1
: there is a sequence namedseq6
with markupsInvented_tag
andAnother_tag
.To fix this, I changed the
xr_tag
allocation to usestrlen
and allocate with the true string length.