althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
120 stars 12 forks source link

Error raised from C code: fseeko() failed, eslESYS (status code 12) #37

Closed k-krakowski closed 1 year ago

k-krakowski commented 1 year ago

Hello Martin! I've encountered an error while trying to load into memory pressed hmm database. I'm using current version of pyhmmer (0.7.4). To reproduce this error you can take these steps:

  1. Download and unzip http://prodata.swmed.edu/ecod/distributions/ecodf.hmm.tar.gz
  2. Press this database with following: pyhmmer.hmmer.hmmpress(pyhmmer.plan7.HMMFile("ecodf.hmm"), "ecod")
  3. Try to load it into memory:
    with pyhmmer.hmmer.HMMFile("ecodf") as hmm_db:
    models = list(hmm_db.optimized_profiles())

    The last one results in:

    EaselError: Error raised from C code: fseeko() failed, eslESYS (status code 12)
    SystemError:  returned a result with an exception set

    If I do something wrong, could you give me, please, an example of loading pressed database into RAM? Thanks in advance for your replay ;)

althonos commented 1 year ago

Hi Kamil, thanks for the bug report!

I traced back the bug in the Offsets class, causing file offsets not to be recorded properly in pyhmmer.hmmer.hmmpress, causing errors later when trying to load the pressed HMMs. I made a patch which will be released with the upcoming v0.8.0.

In the meantime, if you really want to press your HMMs you can use the actual hmmpress from HMMER to press the HMM database, and then load it with PyHMMER. However, most of the PyHMMER interface should accept unpressed HMMs as well, the gain is not that important as compared to HMMER the optimized profile can be built in parallel in different threads!