Open jpjarnoux opened 2 months ago
The problem is that HMMER internally does not report all hits either to save space; that's why PyHMMER can only give you the top hits as well, because that's all it gets from the internal HMMER pipeline. By default, HMMER and PyHMMER use a reporting threshold of E=10
, so all significative hits plus 10 false positives.
I think in #75 the difference in E-value computation may also be the source of the different number of reported hits you're getting, because in theory you should get the same number of hits between HMMER and PyHMMER (I'm testing for that in the unit tests); however if the E-values are not computed the same way, then some values may be above threshold and don't get reported.
Hi !
Yes, I found out it was not possible either in HMMER. The problem came from my database, so no problem here with pyhmmer.
Thanks
Hi!
I suggest adding the possibility of getting all hits instead of the TopHits. As you explain in other issues (#65 or #66), not all hits are reported. So when I compare the result with hmmsearch, I have some hits reported in the domtblout that are not in the pyhmmer.hmmsearch domtblout result.
The best way to deal with that would be to have an argument in pyhmmer.hmmsearch to set the number of hits to report. If set to None by default it will keep the current results, if set at 0 all hits are reported.
Let me know if I understand well how it works