RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
38 stars 8 forks source link

drug_regulatory_status_world_wide is not a valid predicate #402

Open amykglen opened 2 months ago

amykglen commented 2 months ago

so the Plover build for KG2.10.0c is spitting out the warning WARNING: Provided predicate(s) {'biolink:drug_regulatory_status_world_wide'} do not exist in Biolink 4.2.0

looking into that, I can see in KG2.10.0pre that some edges have the predicate drug_regulatory_status_world_wide, like this one: Screenshot 2024-07-18 at 9 11 18 AM

but that doesn't quite appear to be a predicate in Biolink... I can see that term listed in the Biolink model 4.2.0 yaml file (here) but it doesn't have an is_a slot, which means it doesn't fall within the predicate tree: https://tree-viz-biolink.herokuapp.com/predicates/4.2.0 (nor does it appear to be a mixin)

do you know if this is really meant to be used as a predicate? or is it supposed to be the name of a node property?

or maybe these edges are all from RepoDB anyway, which we're going to be removing from KG2?

it looks like they're mostly from RepoDB, but a few hundred from NCIT as well:

match p=(n)-[e:`biolink:drug_regulatory_status_world_wide`]->(m) return distinct e.primary_knowledge_source, count(distinct e) order by count(distinct e)
e.primary_knowledge_source count(distinct e)
"infores:ncit" 402
"infores:repodb" 9312

(FYI this shouldn't be a blocker or anything for rolling out KG2.10.0c)

ecwood commented 2 months ago

It is in Biolink, and it is mapped to our RepoDB mappings in there: image

I suspect that there was confusion when creating the mappings (which was done earlier this year). The predicate doesn't really make sense either (it seems more like a property than a predicate, and it doesn't describe the relationship depicted in those edges). Since RepoDB is getting taken out, I'm not overly concerned, but I will check out NCIT to see why this is happening there.

amykglen commented 2 months ago

just to clarify - I agree it's in Biolink, but it's not in Biolink as a _descendant of the related_to predicate_, which all other predicates I know of are. so that's why I'm not sure if it's considered a valid predicate..

amykglen commented 1 week ago

at today's AHM, we noticed that this is causing TRAPI validation errors for ARAX:

{
  "critical": {},
  "error": {
    "error.knowledge_graph.edge.predicate.invalid": {
      "infores:repodb -> infores:rtx-kg2 -> infores:arax": {
        "biolink:drug_regulatory_status_world_wide": [
          {
            "edge_id": "CHEBI:86463[biolink:Drug|biolink:MolecularMixture]--biolink:drug_regulatory_status_world_wide->MONDO:0007186[biolink:DiseaseOrPhenotypicFeature|biolink:PhenotypicFeature|biolink:Disease]"
          }
        ]
      }
    }
  },

https://arax.ncats.io/?r=e4274554-09e0-4188-a325-376b2ae295ee