althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
130 stars 12 forks source link

ValueError: Could not build HMM: Unable to name the HMM. #19

Closed eloyvallinaes closed 2 years ago

eloyvallinaes commented 2 years ago

When trying to start a search with pyhmmer.plan7.Pipeline.search_msa I get the following error:

ValueError: Could not build HMM: Unable to name the HMM.

After some digging, I realised the pyhmmer.easel.MSA object has a name attribute that I could just fill with a b'placeholder', which makes the error go away. I suspect this attribute should've been set by pyhmmer.easel.MSAFile which I am using to read an alignment in stockholm format.

Is this a bug?

althonos commented 2 years ago

Hi @eloyvallinaes !

This is not a bug, it's just the way HMMER works internally. MSA from Stockholm format may not always contain a name for the whole alignment (a #=GF ID line), and in that case the MSA object returned by the MSAFile will not have a name. However, the internal pyhmmer.plan7.Builder requires that the MSA has one.

I can add a disclaimer to Pipeline.search_msa and other methods to make this clearer.

althonos commented 2 years ago

(It is a bug if your MSA file had an identifier for the whole MSA, though).

eloyvallinaes commented 2 years ago

My MSA file didn't have a #=GF ID line, so you're right it's not a bug. However, the #=GF ID line is not a requirement of the stockholm format so I thought it was weird it was throwing an error. Not knowing what role the name attribute might be playing, I'd prefer MSAFile be able to set the attribute with some generic value if one is not provided, but that's just my suggestion! :smile:

Thanks for a prompt reply!

althonos commented 2 years ago

I'd rather not set a default identifier for the MSA; in the case of hmmbuild, it's always possible to use the basename of the alignment file to name the HMM, but since in pyHMMER the MSAFile may be opened from a file-like object it's not as obvious was to use, and I'm sure having a fallback value would lead to more bugs later.

I have however added warnings to the Builder.build_msa and Pipeline.search_msa methods, as well as a section in the MSA to HMM example so hopefully this will be clearer to future users who get this kind of issue again.

eloyvallinaes commented 2 years ago

Cool! Many thanks for the explanation too!