callahantiff / SemRepRDF

Transforming SemRep Predications into an Open Biomedical Linked Data Resource
10 stars 0 forks source link

HPO vs DOID #1

Open Shicheng-Guo opened 2 years ago

Shicheng-Guo commented 2 years ago

Hi Tiffany,

I am wondering according to your experience, Is there any one is much better than another among HPO vs DOID?

Thanks.

Shicheng

callahantiff commented 2 years ago

Hi @Shicheng-Guo - This is a great question. In my work, I have used both in my work, but if I had to choose only 1, it would depend on my use case. If I were trying to represent data or do some task involving diseases I would probably use DOID because HPO doesn't explicitly represent all diseases (they do have some disease-level concepts). If you wanted to use HPO to represent a disease you would have to identify the set of phenotypes that are associated with it. For example, if you wanted a concept for Cystic Fibrosis, in DOID this would be DOID:1485. For HPO, Cystic Fibrosis would involve all of the concepts shown below (citation: MedGen):

Screen Shot 2022-03-18 at 13 18 22

If you are open to sharing some details about your use case, I would be happy to discuss more specific pros and cons. Let me know if that would be helpful!

Although you did not ask... I suggest you also take a look at Mondo (http://mondo.monarchinitiative.org/). Depending on your use case, it might be the best option. Its scope is similar to DOID (I believe that it is more comprehensive) and it contains many references to the HPO and DOID (as database cross-references). I use this ontology instead of DOID because it's incredibly comprehensive and actively maintained by the most wonderful group of people.

Shicheng-Guo commented 2 years ago

Hi Tiffany,

Thank you so much for the deep explanation. In a project, I need to integrate different data (RWE, EMR, GEO, ICD, PHECODE) and they used different type of ID to represent the diseases and phenotypes (somethings also called intermediate traits like BMI, blood pressure etc). You are right, they have different aims, therefore, have different characteristics and patterns. GEO (Gene Expression Omnibus) usually only have disease name, RWE have both disease and non-disease (like BMI, blood pressure), ICD and PHECODE only indicate disease name. I am trying to find the best code from the list below for the integration for above data: RWE, EMR, GEO, ICD, PHECODE

PUBLIC_MESH_SC (MeSH) PUBLIC_MESH (MeSH) PUBLIC_MEDDRA (MedDRA) SNOMED (Snomed) PUBLIC_MONDO HPO INDICATIONBOOST (SciBite curated MeSH) INDICATION (SciBite curated MeSH) MDRAE (Scibite curated MedDRA) MDRACUTEAE Scibite curated MedDRA Acute Adverse Events branch) DOID (Scibite curated DOID) PUBLIC_DOID MPATH (Pathology) BAO PUBLIC_CDISC_SEND CLINPROC (Clinical Procedures)

Shicheng

callahantiff commented 2 years ago

Hi @Shicheng-Guo. Thanks for the additional context. Given the sources that you listed, I might use both MONDO and HPO, leveraging the mappings that MONDO has created to HPO and DOID (if you only want to sue one, given the sources that you listed above, I would use MONDO). I would then use the UMLS, to help you with aligning both MESH terms and MedRA concepts to MONDO via mappings to HPO and UMLS CUIs. Does that help?