althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
129 stars 12 forks source link

Multiple HMMFiles #24

Closed zdk123 closed 1 year ago

zdk123 commented 2 years ago

This is related to issue #23, but rather having multiple HMMs in a single file, I'd like to treat the HMMs across multiple files as a single iterable to hmmsearch. This would reduce the overall memory footprint (again, similar to the motivations discussed #23).

Here's a wrapper class that does the trick and also works as a context manager:

from pyhmmer.plan7 import HMMFile
from itertools import chain

class HMMFiles():
    def __init__(self, files):
        self.hmmfiles = [HMMFile(f) for f in files]

    def __enter__(self):
        return chain.from_iterable(self.hmmfiles)

    def __exit__(self, *args):
        for f in self.hmmfiles:
            f.close()

Usage:

with pyhmmer.easel.SequenceFile('queries.fasta') as seq_file, HMMFiles(['1.hmm', '2.hmm']) as hmm_file:
    hits = list(pyhmmer.hmmsearch(hmm_file, list(sequences)))

Thought I would leave this here for posterity even if it doesn't fit in the repo.

althonos commented 2 years ago

This is actually a neat trick; if I don't include it to the code, I'll definitely add it to the documentation!

althonos commented 1 year ago

This is now in the documentation since v0.7.4 :smiley: