althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
120 stars 12 forks source link

Best way to Read in/Search Over Many Many HMMs? #49

Closed gbouras13 closed 1 year ago

gbouras13 commented 1 year ago

Hi Martin,

Pyhmmer is awesome - just trying to play around with PHROGs and build it into some tooling.

Using v0.9.0.

One question - what do you think the best way is to read in lots of HMMs? Like 38000? I've made a bunch with pyhmmer really easily.

In the example (https://pyhmmer.readthedocs.io/en/stable/examples/recipes.html#Loading-multiple-HMMs) the hmm were hardcoded. I've tried a few approaches to get around this but am running into a strange error.

For example after tweaking the class to take a list

class HMMFiles(typing.ContextManager[typing.Iterable[HMM]]):
    def __init__(self, files: list['os.PathLike[bytes]']) -> None:
        self.stack = contextlib.ExitStack()
        self.hmmfiles = [self.stack.enter_context(HMMFile(f)) for f in files]

    def __enter__(self) -> typing.Iterable[HMM]:
        return itertools.chain.from_iterable(self.hmmfiles)

    def __exit__(self, exc_value: object, exc_type: object, traceback: object) -> None:
        self.stack.close()

Then specifying the files and reading them in

from pathlib import Path
import glob

# MSA_Phrogs_M50_HMM is the directory in the working dir containg all the .hmms
HMM_dir = Path("MSA_Phrogs_M50_HMM")
pattern = "*.hmm"  # Replace with your desired file pattern
files = HMM_dir.glob(pattern)

with HMMFiles(files) as hmm_files:
    all_hits = list(pyhmmer.hmmsearch(hmm_files, targets))

But this throws a very weird error:

FileNotFoundError: [Errno 2] no such file or directory: PosixPath('MSA_Phrogs_M50_HMM/phrog_29267.hmm')

when this file does definitely exist.

George

althonos commented 1 year ago

Hi George, I think this may be related to #48, you could have a look at the solution there as well!

gbouras13 commented 1 year ago

Hi Martin, thanks for that, it is indeed. Closing this now.

althonos commented 1 year ago

I'll update the documentation to use the suggested solution there instead, to avoid any more issues because of file descriptors.