pyhmmer.hmmer.hmmsearch is always slower than using hmmsearch directly

Gmar999 commented 6 months ago

Hello, Martin! Thank you for your work. I'm trying to use pyhmmer instead on Linux. However, I find that pyhmmer always works slower than using hmmsearch directly. My test is based on 13 gene hmms and several fasta files of already identified genes.

Here is my script:

    hmm_file= "mito_CDS.hmm.h3m"
    seq_file = "ATP8.fasta"  # and others
    seq_num = sum(1 for line in open(seq_file).readlines() if line.startswith(">"))

    t1 = time.time()  
    with pyhmmer.easel.SequenceFile(seq_file, digital=True) as f:
        seqs = f.read_block()
    hmms = pyhmmer.plan7.HMMFile(hmm_file)
    for hits in pyhmmer.hmmer.hmmsearch(hmms, seqs, cpus=48,callback=lambda x,y:print(x,y)):
        pass
    t2 = time.time()
    print(f"pyhmmer-hmmsearch search {seq_file}({seq_num}) using: {t2-t1}")

    t1 = time.time()
    cmd = f"hmmsearch {hmm_file} {seq_file}"
    subprocess.run(cmd, shell=True, capture_output=True)
    t2 = time.time()
    print(f"hmmsearch search {seq_file} using({seq_num}) : {t2-t1}")

And here is the result:

; about hmms <HMM alphabet=Alphabet.dna() M=285 name=b'ND4L'> 13 <HMM alphabet=Alphabet.dna() M=786 name=b'COX3'> 13 <HMM alphabet=Alphabet.dna() M=672 name=b'ATP6'> 13 <HMM alphabet=Alphabet.dna() M=1536 name=b'COX1'> 13 <HMM alphabet=Alphabet.dna() M=1137 name=b'CYTB'> 13 <HMM alphabet=Alphabet.dna() M=948 name=b'ND1'> 13 <HMM alphabet=Alphabet.dna() M=351 name=b'ND3'> 13 <HMM alphabet=Alphabet.dna() M=153 name=b'ATP8'> 13 <HMM alphabet=Alphabet.dna() M=1335 name=b'ND4'> 13 <HMM alphabet=Alphabet.dna() M=516 name=b'ND6'> 13 <HMM alphabet=Alphabet.dna() M=1017 name=b'ND2'> 13 <HMM alphabet=Alphabet.dna() M=1713 name=b'ND5'> 13 <HMM alphabet=Alphabet.dna() M=681 name=b'COX2'> 13 ; ATP8 pyhmmer-hmmsearch search ATP8.fasta(2570) using: 4.567137002944946 hmmsearch search ATP8.fasta using(2570) : 3.0275957584381104 ; COX2 pyhmmer-hmmsearch search COX2.fasta(8291) using: 538.6848599910736 hmmsearch search COX2.fasta using(8291) : 49.312217235565186 ; ND3 pyhmmer-hmmsearch search ND3.fasta(2817) using: 38.611982345581055 hmmsearch search ND3.fasta using(2817) : 19.30312705039978 ; ATP6 pyhmmer-hmmsearch search ATP6.fasta(2788) using: 185.83723640441895 hmmsearch search ATP6.fasta using(2788) : 28.809940576553345

I want to know what happened. Is it because of the method I am using? Thank you very much for your help.

althonos commented 6 months ago

Hi @Gmar999,

An issue I can see in the way you benchmark: by calling hmmsearch directly without specifying --cpu, you always use 2 threads, not 48 like you tell pyhmmer to do. Since you have few HMMs, there is no point in running with cpus=48, try with cpus=2 and check if the times are still off.

Gmar999 commented 5 months ago

Thank you very much for your prompt response, Martin. I changed the parameter from cpus=48 to cpus=2, and then ran today's test. Here are my test results:

;ND3 pyhmmer-hmmsearch search ND3.fasta(2815 seqs) using: 70.86706924438477 hmmsearch search ND3.fasta(2815 seqs) using : 21.94244623184204 ;COX2 pyhmmer-hmmsearch search COX2.fasta(8290 seqs) using: 619.4518868923187 hmmsearch search COX2.fasta(8290 seqs) using : 68.149094581604

It seems that the slowest search is dragging down the entire search process. During the search of COX2.fasta, the search gets stuck on using COX2's prior to search COX2. And hmmsearch seems to be faster for individual hmm searches. But why is there such a big difference in speed?

althonos commented 5 months ago

Would you mind updloading the test files (or you can send them through e-mail if that's sensitive data) so I can check how to reproduce?

hmmsearch will be faster for individual HMM searches because PyHMMER parallelizes on the HMMs rather than the sequences, so if you have a single HMM it cannot use more than one thread. That was done because our use-case (Pfam annotation of complete genomes) always has more HMMs we can use for parallelizing. I documented that there: https://pyhmmer.readthedocs.io/en/stable/performance.html

Gmar999 commented 5 months ago

Okay, here is my data. test.tar.gz

althonos commented 5 months ago

HI @Gmar999, thanks for the data.

I compared the runs of HMMER and PyHMMER, measuring the CPU cycles with valgrind, and I only got a marginal difference (239M cycles with PyHMMER, 236M with HMMER). What this suggests is that the problem does not come from the computation, but from the parallel-programming part which seems to deadlock for ages in recent PyHMMER versions.

This looks like a regression introduced in recent versions, because when I tested with v0.8.0 I got more reasonable timings (PyHMMER still being a little bit slower, but expected because of the parallelization model).

With PyHMMER v0.8.0:

pyhmmer-hmmsearch search ATP6.fasta(2786) using: 26.183855056762695
hmmsearch search ATP6.fasta using(2786) : 17.281102657318115

With PyHMMER v0.10.10:

pyhmmer-hmmsearch search ATP6.fasta(2786) using: 141.94732308387756
hmmsearch search ATP6.fasta using(2786) : 17.341574668884277

althonos commented 5 months ago

It turns out after refactoring the code to compile Easel and HMMER in https://github.com/althonos/pyhmmer/commit/b3d71d836b7312791d16c2af1b2f7b4ec43ccaa9 I forgot to include so flags to compile the extension for SSE4.1, so all releases since v0.10.0 have been running without SSE4.1 enabled, hence the massive slowdown.

Please update to v0.10.11, PyHMMER will still be a tiny bit slower (as expected, because most of the computation time is taken by a single HMM, and that will always be faster in HMMER because the threading work slightly better in this situation, I'd have to implement another parallelization strategy to improve that), but now we're talking 25s instead of 20s, not 500s.

Thanks again for allowing me to find this issue!

althonos / pyhmmer

pyhmmer.hmmer.hmmsearch is always slower than using hmmsearch directly #63