DiseaseOntology / HumanDiseaseOntology

Repository for the Human Disease Ontology.
Creative Commons Zero v1.0 Universal
335 stars 109 forks source link

oral squamous cell carcinoma should be classified as a subclass of squamous cell carcinoma #197

Closed cmungall closed 7 years ago

cmungall commented 7 years ago

squamous cell carcinoma and its children:

note this classification is missing some terms like oral squamous cell cancer:

This means that is you have any data annotated to OSCC and a clinician who is interested in SCCs in general makes a query, they will not find the data

I think DO is under the mistaken understanding that a disease hierarchy should be single inheritance. This is not correct, disease classifications are frequently polyhierarchies.

cmungall commented 7 years ago

Fixing this ticket would fix this issue: #153

lschriml commented 7 years ago

Thank you for the suggestion, however, this does not align with how DO classifies diseases. Organ and cell type cancers are distinct classes in DO. I am resolving this ticket.

cmungall commented 7 years ago

I'm afraid that this renders DO unusable for most informatics applications. However, it's good to know this, I think this design decision needs to be more clearly advertised on the DO site.

lschriml commented 7 years ago

Hello Chris, That is not the case. The DO's classification provides a knowledge rich backbone to then further classify diseases (such as by anatomical location or tissue of origin). And the DO provides these alternative classifications in our GitHub repository, as has been mentioned in these GitHub tickets. In that way, we are working to provide the multiple possible data connection to a disease (and this continues to be expanded upon in DO). As you are well aware the DO is highly utilized in informatics platforms, as evidenced in the hundreds of PubMed articles citing the use of the DO. DO's design is well documented in our publications and website, as the DO's design was first implemented in 2003, The classification of diseases in the DO follow the classification of disease research communities. As in the case of cancer, it is an important distinction to classify cancer by site (organ systems/anatomy) and by morphology (cell of origin), as has been done by the NCI (see NCI's core neoplasm hierarchy, https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Neoplasm/Neoplasm_Core_Hierarchy.html) and the WHO (http://www.who.int/classifications/icd/adaptations/oncology/en/).

Also see: CANCER CLASSIFICATION (https://training.seer.cancer.gov/disease/categories/classification.html) Cancers are classified in two ways: by the type of tissue in which the cancer originates (histological type) and by primary site, or the location in the body where the cancer first developed. This section introduces you to the first method: cancer classification based on histological type. The international standard for the classification and nomenclature of histologies is the International Classification of Diseases for Oncology, Third Edition (ICD-O-3).

To ensure that the DO terms are classified correctly, we have added logical axioms to define the anatomical location and cell of origin for DO cancers. We continue to improve this data representation.

cmungall commented 7 years ago

There is a misunderstanding here. NCIT is released as a polyhierarchy. Even when particular linearized views are presented such as in https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Neoplasm/Neoplasm_Core_Hierarchy.html, it's still a polyhierarchy. For example, starting from either morphology or site, you can navigate to 'Breast Carcinoma'. It is classified both by site (under Breast Neoplasm) and by morphology (under Carcinoma).

In contrast, in the main obo release of DO (doid.obo, HumanDO.obo), it is only in one place:

/ DOID:4 ! disease
  is_a DOID:14566 ! disease of cellular proliferation
   is_a DOID:162 ! cancer
    is_a DOID:0050686 ! organ system cancer
     is_a DOID:5093 ! thoracic cancer
      is_a DOID:1612 ! breast cancer
       is_a DOID:3459 ! breast carcinoma *** 

These is no classification under 'carcinoma', despite this being indisputably correct.

Most consumers of the obo format version of DO are not aware that DO omits the classification of 'breast carcinoma' under 'carcinoma' in this release. I do not think this is made very clear in the documentation or in the papers. Even if it was, not every user of an ontology combs every piece of documentation. People expect DO to behave like other OBO ontologies, which do not deliberately take out valid is_a relationships in the main release. Nobody else in OBO does this, NCIT does not do this.

The fact of the matter is that informatics tools that integrate the obo format release of the DO for querying, enrichment analyses etc will be getting incomplete results. They may not be aware of this, but they are. For example, if a query or enrichment services integrates the DO hierarchy, and we query for 'carcinoma', we don't get entities (drugs, genes) etc annotated to 'breast carcinoma'. I don't think you will find anyone who does not think this is a problem. This problem may be alleviated by using a non-standard release of the DO, but most informatics people are not aware this exists or why they would want to use it.

I think there is a fundamental misunderstanding here. Almost all OBO ontologies have multiple axes of classification. Cancer and disease is not particularly different here.

The situation is slightly improved with the doid.owl release, which has the equivalence axioms present. But consumers of this may not know that they need to use a reasoner or axiom weakening to get the complete polyhierarchy. We can see this now with different tools consuming DO displaying them in different ways. For example, in OLS and OntoBee we see 'breast carcinoma' when we open 'carcinoma'. However, in contrast we don't see this in BioPortal. We also don't see this in many of the databases that are using DO to power queries, or in the DO browser. Is this inconsistency in presentation and query functionality a good thing?

The DO should be the same as every single OBO library ontology:

lschriml commented 7 years ago

Thank you Chris for the detailed explanation. Lets chat about this when we are at biocuration next month.

cmungall commented 7 years ago

I note also that MGI, one of the main contributors to DO, is using the OWL edition of DO (we can tell, as this has the correct dual parentage, in cases such as http://www.informatics.jax.org/disease/DOID:3459 ).

It seems only a minority of users are getting the single is-a parent default obo version - but these consumers are not aware they are missing all of these correct relationships, they chose the obo because it is easier for them to parse, not because they wanted to have edges removed.

sbello commented 7 years ago

Just to clarify Chris, MGI is using an OBO version of the DO file with all the inferred is_a edges included. Sue

On Mar 13, 2017 11:21 AM, "Chris Mungall" notifications@github.com wrote:

I note also that MGI, one of the main contributors to DO, is using the OWL edition of DO (we can tell, as this has the correct dual parentage, in cases such as http://www.informatics.jax.org/disease/DOID:3459 ).

It seems only a minority of users are getting the single is-a parent default obo version - but these consumers are not aware they are missing all of these correct relationships, they chose the obo because it is easier for them to parse, not because they wanted to have edges removed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DiseaseOntology/HumanDiseaseOntology/issues/197#issuecomment-286140106, or mute the thread https://github.com/notifications/unsubscribe-auth/AKeJ0d7Nsq_O7Y5jshHiY8-oesTMWdIrks5rlV8BgaJpZM4KOO0F .

cmungall commented 7 years ago

Which obo version? The official OBO library obo format URL is http://purl.obolibrary.org/obo/doid.obo, but this is mapped to the file doid-non-classified.obo in the DO github repo (ie does not have CNS leukemia is-a leukemia). I suspect MGI are using a github URL with one of the files here https://github.com/DiseaseOntology/HumanDiseaseOntology/tree/master/src/ontology such as the one (confusingly) called doid.obo, which does have the inferred links asserted.

If so, I think MGI are doing the right thing in consuming this version of the DO. But other obo consumers may not be aware of the differences and may be consuming the other version. For example, the JAX cancer knowledge base consumes the single is-a parent version, so they have missing links when compared to the MGI browser. This situation needs to be addressed ASAP.

sbello commented 7 years ago

We use the one called doid-merged.obo. This includes the inferred branches plus the susceptibility terms. Sue

On Mar 13, 2017 11:58 AM, "Chris Mungall" notifications@github.com wrote:

Which obo version? The official OBO library obo format URL is http://purl.obolibrary.org/obo/doid.obo, but this is mapped to the file doid-non-classified.obo in the DO github repo. I suspect MGI are using a github URL with one of the files here https://github.com/DiseaseOntology/ HumanDiseaseOntology/tree/master/src/ontology such as the one (confusingly) called doid.obo, which does have the inferred links asserted.

If so, I think MGI are doing the right thing in consuming this version of the DO. But other obo consumers may not be aware of the differences and may be consuming the other version. For example, the JAX cancer knowledge base consumes the single is-a parent version, so they have missing links when compared to the MGI browser. This situation needs to be addressed ASAP.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DiseaseOntology/HumanDiseaseOntology/issues/197#issuecomment-286152405, or mute the thread https://github.com/notifications/unsubscribe-auth/AKeJ0QL6ZIRNgrPaGYlIw3u9_GToTr8Oks5rlWe-gaJpZM4KOO0F .