RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

subclass reasoning/inference for UBERON #1849

Closed dkoslicki closed 1 month ago

dkoslicki commented 2 years ago

From the Relay, it appears RTX-KG2 is not doing subclass inference for UBERON

saramsey commented 2 years ago

I opened a Cypher session on kg2canonicalized.rtx.ai, which contains KG2.7.5c, and ran the following Cypher query:

match (n)-[r:`biolink:subclass_of`]->(m) where n.id =~ 'UBERON:.*' and m.id =~ 'UBERON:.*' return count(*);

and it returned 24,639. So it appears that there are 24,639 UBERON-[subclass_of]->UBERON type edges in KG2.7.5c. Here is an example:

match (n)-[r:`biolink:subclass_of`]->(m) where n.id =~ 'UBERON:.*' and m.id =~ 'UBERON:.*' return n.id, r.predicate, r.knowledge_source, m.id limit 10;

returning:


n.id | r.predicate | r.knowledge_source | m.id
-- | -- | -- | --
"UBERON:0018355" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0008293" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0008292" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0008291" | "biolink:subclass_of" | ["infores:genepio", "infores:uberon"] | "UBERON:0000022"
"UBERON:0034930" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0014480" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0018688" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0018538" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0018539" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"
"UBERON:0018537" | "biolink:subclass_of" | ["infores:uberon"] | "UBERON:0000022"

So this seems to be not an issue with KG2c, but rather, perhaps an issue with the RTX-KG2 API or PloverDB perhaps? I am tagging @amykglen in the hopes that she can weigh in. If it is RTX-KG2 API or PloverDB, in that case I would vote to transfer this issue to the RTX repo issue tracker.

amykglen commented 2 years ago

yes, this is something we need to do with Plover. when we implemented subclass_of reasoning we only did it for the more common kinds of pinned query nodes (drugs, diseases), which seemed sufficient early on, but now we need to expand that. so I agree this issue can be transferred to the RTX repo.

amykglen commented 1 year ago

I'll be addressing this soon at the same time as #1812

edeutsch commented 1 month ago

try to fix for end of Sprint 6? @amykglen

amykglen commented 1 month ago

this is live on KG2 Plover CI! for instance, submitting this query for molecular activities related to 'exocrine gland' to kg2cploverdb.ci.transltr.io returns results involving 'exocrine gland' but also 'liver':

{
  "edges": {
    "e00": {
      "object": "n01",
      "predicates": [
        "biolink:related_to"
      ],
      "subject": "n00"
    }
  },
  "nodes": {
    "n00": {
      "ids": [
        "UBERON:0002365"
      ]
    },
    "n01": {
      "categories": [
        "biolink:MolecularActivity"
      ]
    }
  }
}

closing