althonos / pyhmmer

Cython bindings and Python interface to HMMER3.
https://pyhmmer.readthedocs.io
MIT License
128 stars 12 forks source link

[Question] What does does `included` and `reported` mean in pyhmmer.plan7.Hit? #66

Closed jolespin closed 7 months ago

jolespin commented 7 months ago

I'm looking through the documentation and trying to understand what included and reported mean: https://pyhmmer.readthedocs.io/en/stable/api/plan7.html#pyhmmer.plan7.Hit.included

The docs say Whether this hit is marked as included. and Whether this hit is marked as reported. but I'm not sure what this means.

Does it mean that hit was determined based on an e-value (reported) and if it passed some threshold (e.g., gathering) then it would be marked as included? If so, how does this work when there are no thresholds specified?

althonos commented 7 months ago

Hi @jolespin, included and reported correspond to HMMER threshold (-E specifies reporting threshold by E-value, --incE specifies inclusion thresholds by E-value).

Usually in a hmmsearch run all hits that you get pass the reporting thresholds (Hit.reported == True) and you can ignore the inclusion thresholds.

Where this is actually useful is for jackhmmer, where inclusion thresholds control which hits get included to build the HMM for the next iteration.

By default when you run hmmsearch, there is always a threshold, but it's -E 10.0 and --incE 10.0 so you're virtually including all relevant hits + up to 10 false positives.

jolespin commented 7 months ago

Thank you, this definitely answers my question. Rewriting all my pipelines to use this. It's so much faster and I have way more control than before.