RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
39 stars 8 forks source link

Turn Provided By Into a Compliant Field #70

Open ecwood opened 3 years ago

ecwood commented 3 years ago

According to the Biolink model, it appears that it should be a list.

ecwood commented 3 years ago

Also, @saramsey and I discussed the importance of ensuring that the UMLS_STY nodes have UMLS_STY: as their provided by (not an ontology). This code https://github.com/RTXteam/RTX-KG2/blob/8d3f9b276f7486aa901f6de80f834c71c7487612/kg2_util.py#L493-L495 was included to prevent that from happening. However, with the changes to the format of provided_by (from an IRI to a CURIE), that code doesn't serve any purpose.

Luckily, this isn't an issue because semantic types are now only brought in through one source (umls-semantictypes.ttl): https://github.com/RTXteam/RTX-KG2/blob/8d3f9b276f7486aa901f6de80f834c71c7487612/multi_ont_to_json_kg.py#L628-L630

(They used to be brought in through every UMLS source because they all had a section like this):

 <http://purl.bioontology.org/ontology/STY/T042> a owl:Class ;
     skos:notation "T042"^^xsd:string ;
     skos:prefLabel "Organ or Tissue Function"@en .

 <http://purl.bioontology.org/ontology/STY/T020> a owl:Class ;
     skos:notation "T020"^^xsd:string ;
     skos:prefLabel "Acquired Abnormality"@en .

 <http://purl.bioontology.org/ontology/STY/T102> a owl:Class ;
     skos:notation "T102"^^xsd:string ;
     skos:prefLabel "Group Attribute"@en .

 <http://purl.bioontology.org/ontology/STY/T129> a owl:Class ;
     skos:notation "T129"^^xsd:string ;
     skos:prefLabel "Immunologic Factor"@en .

 <http://purl.bioontology.org/ontology/STY/T049> a owl:Class ;
     skos:notation "T049"^^xsd:string ;
     skos:prefLabel "Cell or Molecular Dysfunction"@en .

 <http://purl.bioontology.org/ontology/STY/T046> a owl:Class ;
     skos:notation "T046"^^xsd:string ;
     skos:prefLabel "Pathologic Function"@en .

 <http://purl.bioontology.org/ontology/STY/T071> a owl:Class ;
     skos:notation "T071"^^xsd:string ;
     skos:prefLabel "Entity"@en .

 <http://purl.bioontology.org/ontology/STY/T204> a owl:Class ;
     skos:notation "T204"^^xsd:string ;
     skos:prefLabel "Eukaryote"@en .
saramsey commented 3 years ago

Hi @ericawood nice, I had forgotten about the code at L628-630. Nice catch.

I concur-- the special code at L494-495 in kg2_util.py now seems to be no longer needed, and probably never enters the if block.

ecwood commented 3 years ago

As it turns out, provided_by is no longer a biolink approved field:

https://github.com/biolink/biolink-model/blob/763c53f5e656bb5c47d99fe893a2bc60bae70b8c/biolink-model.yaml#L5847-L5853\ states

  provided by:
    is_a: association slot
    deprecated: >-
      This slot is deprecated and replaced by a set of more precise slots for describing
      the source retrieval provenance of an Association.  These include 'knowledge source'
      and its descendants 'primary knowledge source', 'original knowledge source', and
      'aggregator knowledge source'.