NCI-Thesaurus / thesaurus-obo-edition

OBO Library edition of NCIt
22 stars 8 forks source link

Details about Term properties #72

Open asoket opened 2 years ago

asoket commented 2 years ago

Hello, Looking for detail documentation on NCIt Term elements (properties) used in "ncit.obo". There are total 23 elements in the "ncit.obo" file used in different combinations. These elements are namely: "id", "name", "def", "subset", "synonym", "is_a", "relationship", "property_value", "xref", "is_obsolete", "comment", "data-version", "disjoint_from", "domain", "format-version", "intersection_of", "is_transitive", "ontology", "range", "remark", "subsetdef", "[Term]", "[Typedef]". Looking for documentation similar to the Gene Ontology file "go.obo" as available at http://geneontology.org/docs/GO-term-elements.

balhoff commented 2 years ago

@asoket it sounds like you are looking for documentation on the OBO format itself? If so, is this useful? https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html

asoket commented 2 years ago

@balhoff It is useful indeed. Thank you so much.

asoket commented 2 years ago

@balhoff I have a few more questions. 1) In the ncit.obo file, I find NCIT:Pnnn and NCIT:Annn codes are used in the "property_value" element/property of many NCIT Terms. However I do not find any Term in ncit.obo with any of these ids. However, at OLS Ontology search at https://www.ebi.ac.uk/ols/search?q=NCIT_P98&ontology=ncit I can find NCIT:P98 and other "P" & "A" Terms with only "name" and "def". Is there any location where I can find the complete ontology on all Pnnn & Annn? Or can I download all terms associated with "P" and "A". 2) Is there any documentation that will help me understand the functional aspect of elements/properties like "property_value", "relationship" etc. beyond what is explained in https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html? 3) I am trying to build an oncology knowledge graph from the ncit that can be easily accessed by an oncologist at the point-of-care integrated with the hospital EHR.

asoket commented 2 years ago

@balhoff some more points:

  1. In "id: NCIT:C10000" of ncit.obo we have "property_value: NCIT:P310 "Obsolete_Concept" xsd:string". Does this mean that this is equivalent to "is_obsolete: true"?
  2. In "id: NCIT:C77696", we have "property_value: NCIT:P98 "Wed Mar 21 17:54:09 EDT 2012 - Obsolete term, delisted from UNII" xsd:string". Is it equivalent to "is_obsolete: true" for this term?
balhoff commented 2 years ago

Hi @asoket,

In the ncit.obo file, I find NCIT:Pnnn and NCIT:Annn codes are used in the "property_value" element/property of many NCIT Terms. However I do not find any Term in ncit.obo with any of these ids. However, at OLS Ontology search at https://www.ebi.ac.uk/ols/search?q=NCIT_P98&ontology=ncit I can find NCIT:P98 and other "P" & "A" Terms with only "name" and "def". Is there any location where I can find the complete ontology on all Pnnn & Annn? Or can I download all terms associated with "P" and "A".

This seems like a problem with the OBO format file. These terms (e.g., NCIT:P98) are properly defined in the OWL syntax version, which is what OLS is loading:

 <!-- http://purl.obolibrary.org/obo/NCIT_P98 -->

    <owl:AnnotationProperty rdf:about="http://purl.obolibrary.org/obo/NCIT_P98">
        <obo:IAO_0000115>A property representing notations made by NCI vocabulary curators. They are intended to provide supplemental, unstructured information to the user or additional insight about the concept.</obo:IAO_0000115>
        <obo:NCIT_NHC0>P98</obo:NCIT_NHC0>
        <obo:NCIT_P106>Conceptual Entity</obo:NCIT_P106>
        <obo:NCIT_P107>DesignNote</obo:NCIT_P107>
        <obo:NCIT_P108>DesignNote</obo:NCIT_P108>
        <oboInOwl:hasExactSynonym>DesignNote</oboInOwl:hasExactSynonym>
        <rdfs:label>DesignNote</rdfs:label>
        <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
    </owl:AnnotationProperty>

Thanks for pointing this out!

Is there any documentation that will help me understand the functional aspect of elements/properties like "property_value", "relationship" etc. beyond what is explained in https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html?

The OBO format documentation is primarily written in terms of how it maps to the OWL language, which is much more formally specified. A good place to start with OWL is here: https://www.w3.org/TR/owl2-primer/ If you are writing software to work with ontologies, I personally recommend using one of the OWL syntaxes rather than OBO, and using a standard RDF or OWL parser to do the parsing.

I am trying to build an oncology knowledge graph from the ncit that can be easily accessed by an oncologist at the point-of-care integrated with the hospital EHR.

As part of this project I created a simplified graph from NCIT using OWL reasoning; you might be interested: https://github.com/NCI-Thesaurus/thesaurus-obo-edition/wiki/NCIt-graph-queries

In "id: NCIT:C10000" of ncit.obo we have "property_value: NCIT:P310 "Obsolete_Concept" xsd:string". Does this mean that this is equivalent to "is_obsolete: true"?

I think so, but NCIT provides some other fine-grained statuses like "Retired_Concept"; I not sure exactly what the difference is.

In "id: NCIT:C77696", we have "property_value: NCIT:P98 "Wed Mar 21 17:54:09 EDT 2012 - Obsolete term, delisted from UNII" xsd:string". Is it equivalent to "is_obsolete: true" for this term?

I wouldn't say it's equivalent; it just provides some extra information about the obsoletion.

asoket commented 2 years ago

Thank you so much for your help.

Kind Regards & Thank You in Advance. Asoke K Talukder, Ph.D

On Tue, Jul 13, 2021 at 11:42 PM Jim Balhoff @.***> wrote:

Hi @asoket https://github.com/asoket,

In the ncit.obo file, I find NCIT:Pnnn and NCIT:Annn codes are used in the "property_value" element/property of many NCIT Terms. However I do not find any Term in ncit.obo with any of these ids. However, at OLS Ontology search at https://www.ebi.ac.uk/ols/search?q=NCIT_P98&ontology=ncit I can find NCIT:P98 and other "P" & "A" Terms with only "name" and "def". Is there any location where I can find the complete ontology on all Pnnn & Annn? Or can I download all terms associated with "P" and "A".

This seems like a problem with the OBO format file. These terms (e.g., NCIT:P98) are properly defined in the OWL syntax version, which is what OLS is loading:

<owl:AnnotationProperty rdf:about="http://purl.obolibrary.org/obo/NCIT_P98">
    <obo:IAO_0000115>A property representing notations made by NCI vocabulary curators. They are intended to provide supplemental, unstructured information to the user or additional insight about the concept.</obo:IAO_0000115>
    <obo:NCIT_NHC0>P98</obo:NCIT_NHC0>
    <obo:NCIT_P106>Conceptual Entity</obo:NCIT_P106>
    <obo:NCIT_P107>DesignNote</obo:NCIT_P107>
    <obo:NCIT_P108>DesignNote</obo:NCIT_P108>
    <oboInOwl:hasExactSynonym>DesignNote</oboInOwl:hasExactSynonym>
    <rdfs:label>DesignNote</rdfs:label>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:AnnotationProperty>

Thanks for pointing this out!

Is there any documentation that will help me understand the functional aspect of elements/properties like "property_value", "relationship" etc. beyond what is explained in https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html?

The OBO format documentation is primarily written in terms of how it maps to the OWL language, which is much more formally specified. A good place to start with OWL is here: https://www.w3.org/TR/owl2-primer/ If you are writing software to work with ontologies, I personally recommend using one of the OWL syntaxes rather than OBO, and using a standard RDF or OWL parser to do the parsing.

I am trying to build an oncology knowledge graph from the ncit that can be easily accessed by an oncologist at the point-of-care integrated with the hospital EHR.

As part of this project I created a simplified graph from NCIT using OWL reasoning; you might be interested: https://github.com/NCI-Thesaurus/thesaurus-obo-edition/wiki/NCIt-graph-queries

In "id: NCIT:C10000" of ncit.obo we have "property_value: NCIT:P310 "Obsolete_Concept" xsd:string". Does this mean that this is equivalent to "is_obsolete: true"?

I think so, but NCIT provides some other fine-grained statuses like "Retired_Concept"; I not sure exactly what the difference is.

In "id: NCIT:C77696", we have "property_value: NCIT:P98 "Wed Mar 21 17:54:09 EDT 2012 - Obsolete term, delisted from UNII" xsd:string". Is it equivalent to "is_obsolete: true" for this term?

I wouldn't say it's equivalent; it just provides some extra information about the obsoletion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCI-Thesaurus/thesaurus-obo-edition/issues/72#issuecomment-879297180, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG77GP6JIJPNZUYG4RRQ2X3TXR6ZLANCNFSM4757KGDA .