EddyRivasLab / hmmer

HMMER: biological sequence analysis using profile HMMs
http://hmmer.org
Other
305 stars 69 forks source link

Nhmmer: different (2x smaller) e-values with fasta and hmmerdb #319

Closed Augustin-Zidek closed 7 months ago

Augustin-Zidek commented 8 months ago

Nhmmer reports different e-values (consistently off by an exact factor of 2) when I run against a fasta and a hmmerdb made from that fasta using makehmmerdb.

Reproduction

  1. Download rfam, let's call it rfam.fasta.

  2. Produce HMMERDB from the downloaded fasta:

    makehmmerdb rfam.fasta rfam.hmmerdb
  3. Run Nhmmer against the hmmerdb:

    nhmmer --tblout /tmp/tblout.txt --rna --watson -A /tmp/output.a3m /tmp/query.a3m rfam.hmmerdb
  4. Run Nhmmer against the fasta:

    nhmmer --tblout /tmp/tblout.txt --rna --watson -A /tmp/output.a3m /tmp/query.a3m rfam.fasta

Expected result Since I ran with --watson, I would expect the e-values to match, but they are consistently off by a factor of 2, with the hmmerdb being 2x larger.

More observations

cryptogenomicon commented 8 months ago

Thanks (and thanks for the concise and excellent summary). I've fixed that in our develop branch, and the fix will appear in the next release.

Augustin-Zidek commented 7 months ago

Thanks for the quick fix!