Closed holtgrewe closed 7 months ago
After contact with the HPO author, it looks like actually using the genes_to_phenotypes.txt
file would be more appropriate to import by the hpo
create.
Depending on your use case, you can remove all non-leaf terms and compare only the leaves of HpoSets. https://docs.rs/hpo/latest/hpo/struct.HpoSet.html#method.child_nodes
let gene = ontology.gene_by_name("ARID1B").unwrap();
let set = gene.to_hpo_set(&ontology).child_nodes();
set.similarity(....)
Not sure if that helps, but is something that I recommend for most comparisons.
While we're at it, for comparisons, I usually also remove modifier_terms (or remove them in place) so that the comparison only uses children of Phenotypical abnormality
Took a while for me to finally grasp this issue. I never considered this to be an problem, but now finally realized that terms should not be transitively added to genes. That way they will behave the same way as diseases. Will be fixed with this pull request and updated on crates.io with the next release.
For example,
ARID1B
has 532 (sic!) unique pairs of HPO and ARID1B in release 2023-06-06.It looks like the full parent sub DAG is stored for each gene, as the association includes the
All
term.I would suggest to prune the imported
phenotype_to_genes.txt
list as follows:Otherwise, similarity computation get problematic for highly annotated genes such as ARID1B.