Closed nextgenusfs closed 2 years ago
Okay, well I guess this is the way to do it.
import pyhmmer
# load sequences
alphabet = pyhmmer.easel.Alphabet.amino()
with pyhmmer.easel.SequenceFile('queries.fasta', digital=True, alphabet=alphabet) as seq_file:
sequences = list(seq_file)
# then load Pfam and search
with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file:
hits = list(pyhmmer.hmmsearch(hmm_file, sequences))
Hi @nextgenusfs ! Happy that you figured it out. The issue was in here:
I naively assumed that the HMMFile would do the same?
with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file: hmm = list(hmm_file.read())
If you wanted to read all the HMMs from the input file, you'd have to do:
with pyhmmer.plan7.HMMFile('/path/to/Pfam-A.hmm') as hmm_file:
hmms = list(hmm_file)
But what you end up doing in your second comment is even better, because it will stream the HMMs as they are read from the file, instead of pre-loading them all in a list (which, for a HMM database like Pfam, can load a lot of things in memory),
I'm probably missing something obvious, but I can't seem to figure out how to use pyhmmer to search against a database of HMM profiles, it Pfam. When I try to read the file, it does not seem to iterate over the individual profiles.
Loading sequences you can do this, which create a list of the digitized sequences in
I naively assumed that the HMMFile would do the same?
This errors out with
TypeError: 'pyhmmer.plan7.HMM' object is not an iterator
.If I just run this:
Then it is just the single HMM profile (the first one in the Pfam database 1-cysPrx_C). So what am I doing wrong?