Open jpjarnoux opened 2 years ago
Hi @jpjarnoux
If I'm not mistaken, this option doesn't exist in the original HMMER either, because of the difficulty to compute coverage for an alignment which can contain more than one domain. I like this example (from the documentation examples):
Hi, Thanks, I will try to catch up why there is this value in the metadata and how they're using it.
Hi, I come back to this question. I have some new information. To compute the target and hmm coverage, they choose the best domain. I tried to reproduce, but unfortunately the length of the query and the target (HMM and sequence) are not saved and accessible in the HIT. I would to take it when I digitalized my sequences and when I'm reading my HMM, but the length is not an attribute in HMM object and DigitalSequence. For the sequence I can easily get the length by another way, but for the HMM, it could be really easier if the length was read and add as an attribute (in HMM file there is LENG key). Do you think you could add those lengths as attribute in Sequence, HMM and Hit or Alignment object ?
Hi @jpjarnoux
You can get the length of HMMs from HMM
objects:
hmm_lengths = {}
with pyhmmer.plan7.HMMFile("Pfam-A.h3m") as hmm_file:
for hmm in hmm_file:
hmm_lengths[hmm.name] = len(hmm.consensus)
Then:
n_aligned_positions = len(
hit.best_domain.alignment.hmm_sequence
) - hit.best_domain.alignment.hmm_sequence.count(".")
hmm_coverage = (
n_aligned_positions / hmm_lengths[hit.best_domain.alignment.hmm_name]
)
As @althonos said, this is an oversimplification because you can have multiple domains in the hit. But it can be pretty useful for HMMs of full-length proteins.
Sorry, I forget to reply. That works for me, thank you.
A less dumb approach:
def get_hmm_coverage(domain):
n_aligned_positions = domain.alignment.hmm_to - domain.alignment.hmm_from + 1
return n_aligned_positions / domain.alignment.hmm_length
with pyhmmer.plan7.HMMFile("Pfam-A.h3m") as hmm_file:
for hits in pyhmmer.hmmsearch(hmm_file, seqs, bit_cutoffs="gathering"):
for hit in hits:
for domain in hit.domains.included:
hmm_coverage = get_hmm_coverage(domain)
Hi !
I was searching if that's possible to report the alignment coverage on the HMM and the target. I'm using the PADLOC-DB as HMM database and I notice that they include
hmm.coverage.threshold
andtarget.coverage.threshold
to filter results in there metadata file.So I was searching if these values were in the HIT object, but they're not, and I don't find them in the documentation. Did I miss something ?
Thanks