RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Why is concept UMLS:C0178572 not in KG2? #1128

Closed saramsey closed 3 years ago

saramsey commented 3 years ago

See:

https://github.com/RTXteam/RTX/issues/1127#issuecomment-730517337

kvarforl commented 3 years ago

Okay I started investigating this, and figured I would start by seeing if I could find the concept in any of the umls build files. I did so by running find . -name "umls*" -type f -exec grep C0178572 {} +; from the kg2-build directory, which returned the following:

./umls-mth.json:      "obj" : "http://purl.bioontology.org/ontology/MTH/C0178572"
./umls-mth.ttl: <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0178572> ;

All of the links lead to " the page you are looking for wasn't found". Not super sure where to go from here.

@saramsey could you walk me through how and where you found the info mentioned in this comment?

saramsey commented 3 years ago

OK, here is what KG2.3.5 knows about UMLS:C0178572:

{
  "iri": "https://identifiers.org/umls:C0178572",
  "category_label": "named_thing",
  "deprecated": "False",
  "provided_by": "umls_source:MTH",
  "id": "UMLS:C0178572",
  "category": "biolink:NamedThing",
  "update_date": "2020"
}

which I got by running this Cypher query:

match (n {id: 'UMLS:C0178572'}) return n;

Note that there is no name field, which is kind of weird (I guess it is null so not displayed by Neo4j). But going to the hyperlink https://identifiers.org/umls:C0178572, I see that this node does have a name according to Linked Life Data, it's name is court:

Screen Shot 2020-12-17 at 9 58 08 PM

So, @kvarforl can you please check umls-mth.ttl to verify that UMLS concept C0178572 has no name field, at least in that TTL file? You can just paste the TTL record for that concept, here in the issue.

kvarforl commented 3 years ago

hmm okay, here are the contents of the only occurrence of C0178572 , from running grep -C 10 "C0178572" kg2-build/umls-mth.ttl on kg2steve:

<http://purl.bioontology.org/ontology/MTH/C0022433> a owl:Class ;
    skos:prefLabel """Principles of law and justice"""@en ;
    skos:notation """C0022433"""^^xsd:string ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0016556> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0680513> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0016557> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0086530> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0178572> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0178675> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0014649> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0013277> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0220868> ;
    <http://purl.bioontology.org/ontology/MTH/RO> <http://purl.bioontology.org/ontology/MTH/C0362060> ;
    UMLS:has_cui """C0022433"""^^xsd:string ;
    UMLS:has_tui """T064"""^^xsd:string ;
    UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T064> ;
 .
saramsey commented 3 years ago

Just rejoining this thread after a month (sorry). So, it looks like the UMLS metathesaurus file umls-mth.ttl file may be incomplete, since it doesn't appear to define the concept http://purl.bioontology.org/ontology/MTH/C0178572 but it does clearly cross-reference it. You could check the umls-mth.ttl file to see if it appears to be truncated (IIRC, it should end with a bunch of turtle statements about semantic types, if it is complete). Other options include checking the UMLS Mysql database to see if there is a row in the MRCONSO table correpsonding to UMLS concept C0178572.

kvarforl commented 3 years ago

hmm okay, the tail of umls-mth.ttl looks like this:

ubuntu@ip-172-31-59-26:~$ tail kg2-build/umls-mth.ttl 
<http://purl.bioontology.org/ontology/STY/T025> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T021> .
<http://purl.bioontology.org/ontology/STY/T091> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T090> .
<http://purl.bioontology.org/ontology/STY/T203> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T074> .
<http://purl.bioontology.org/ontology/STY/T042> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T039> .
<http://purl.bioontology.org/ontology/STY/T020> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T190> .
<http://purl.bioontology.org/ontology/STY/T102> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T077> .
<http://purl.bioontology.org/ontology/STY/T129> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T123> .
<http://purl.bioontology.org/ontology/STY/T049> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T046> .
<http://purl.bioontology.org/ontology/STY/T046> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T038> .
<http://purl.bioontology.org/ontology/STY/T204> rdfs:subClassOf <http://purl.bioontology.org/ontology/STY/T001> .

which, to my mostly untrained eye, looks like a bunch of turtle statements about semantic types.

saramsey commented 3 years ago

OK, in that case I think the next step is to search the table MRCONSO to see if CUI C0178572 is in there.

https://www.ncbi.nlm.nih.gov/books/NBK9685/table/ch03.T.concept_names_and_sources_file_mr/

saramsey commented 3 years ago

In the MRCONSO table in the umls MySQL database on kg2lindsey.rtx.ai, we have:

mysql> select * from MRCONSO where CUI='C0178572';
+----------+-----+----+----------+-----+-----------+--------+-----------+------------+------------+-----------+-----+-----+------------+---------+-----+----------+------+
| CUI      | LAT | TS | LUI      | STT | SUI       | ISPREF | AUI       | SAUI       | SCUI       | SDUI      | SAB | TTY | CODE       | STR     | SRL | SUPPRESS | CVF  |
+----------+-----+----+----------+-----+-----------+--------+-----------+------------+------------+-----------+-----+-----+------------+---------+-----+----------+------+
| C0178572 | ENG | P  | L0215094 | PF  | S0288834  | N      | A0318718  | NULL       | NULL       | 2724-8820 | CSP | PT  | 2724-8820  | court   |   0 | N        |  256 |
| C0178572 | ENG | P  | L0215094 | PF  | S0288834  | Y      | A18577800 | 0000060808 | 0000018320 | NULL      | CHV | PT  | 0000018320 | court   |   0 | N        |  256 |
| C0178572 | ENG | P  | L0215094 | VO  | S11872390 | Y      | A18596387 | 0000060809 | 0000018320 | NULL      | CHV | SY  | 0000018320 | courted |   0 | N        |  256 |
| C0178572 | ENG | P  | L0215094 | VO  | S11872392 | Y      | A18596388 | 0000060810 | 0000018320 | NULL      | CHV | SY  | 0000018320 | courts  |   0 | N        |  256 |
| C0178572 | ENG | P  | L0215094 | VO  | S1220140  | Y      | A7564539  | 12220      | NULL       | NULL      | PSY | ET  | 12220      | Courts  |   3 | N        | NULL |
+----------+-----+----+----------+-----+-----------+--------+-----------+------------+------------+-----------+-----+-----+------------+---------+-----+----------+------+
saramsey commented 3 years ago

The columns of the MRCONSO table are explained here: https://www.ncbi.nlm.nih.gov/books/NBK9685/table/ch03.T.concept_names_and_sources_file_mr/

Looks like the CUI C0178572 came from UMLS sources CHV, CSP, and PSY. What are those sources?

saramsey commented 3 years ago

So, if the three UMLS sources that have terms that map to CUI C0178572, none of them are in KG2. That would seem to explain why C0178572 is not in KG2. Of these, it seems the most reasonable might be to add the PSY to KG2.

saramsey commented 3 years ago

after adding PSY to umls.conf and rerunning umls2rdf.py on kg2lindsey.rtx.ai, the following TTL block shows up in the newly generated file umls-psy.ttl:

<http://purl.bioontology.org/ontology/PSY/12220> a owl:Class ;
        skos:prefLabel """Courts"""@en ;
        skos:notation """12220"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/PSY/use> <http://purl.bioontology.org/ontology/PSY/00840> ;
        <http://purl.bioontology.org/ontology/PSY/PYR> """1973"""^^xsd:string ;
        UMLS:has_cui """C0178572"""^^xsd:string ;
        UMLS:has_tui """T092"""^^xsd:string ;
        UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T092> ;

which would seem to define CUI C0178572 as expected.

saramsey commented 3 years ago

OK, I think this should be fixed now. Testing needed.

ecwood commented 3 years ago

This appears to be fixed in KG2.5.2. From Neo4j:

{
  "iri": "https://identifiers.org/umls:C0178572",
  "category_label": "agent",
  "deprecated": "False",
  "name": "Courts",
  "provided_by": "identifiers_org_registry:umls",
  "id": "UMLS:C0178572",
  "category": "biolink:Agent",
  "update_date": "2004"
}