RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

ARAX returning results with predicates that are not supported by Biolink Model 3.1.2 #1967

Closed edeutsch closed 4 months ago

edeutsch commented 1 year ago

I looked into this very briefly so I thought I'd start an issue locally.

Related to this: https://github.com/NCATSTranslator/Feedback/issues/118

It seems like there are two possibilities: 1) KG2.8.0 contains "gene_associated_with_condition", which apparently it should not 2) They are querying production KG2.7.6 instead of CI KG2.8.0

The query in question appears to be: 827077 2023-02-15 19:42:47 40 ars.ci.transltr.io 52.4.10.150, 10.11.0.190 arax.ci.transltr.io arax-0 ARAX 5094 129227 ✓ Completed OK Normal completion with 24 results.

Are the result is https://arax.ncats.io/?r=129227 https://arax.ncats.io/api/arax/v1.3/response/129227

As far as I can tell they are querying CI with KG2.8.0, so I think that rules out option 2. Although I am not really certain.

There is also potentially option 3: 3) As far as we were aware, the ask was for gene-chemical associations: image not for gene-disease associations. So this is not yet done.

That's about all I know. Tossing this out for others more knowledgeable..

chunyuma commented 1 year ago

I will wait for KG2.8.2c ready and then re-train the creative model to solve this issue.

edeutsch commented 1 year ago

This issue is still generating Translator-wide issues for us: https://github.com/NCATSTranslator/Feedback/issues/330#issuecomment-1588344312

Apparently any edge that does not have a valid Biolink predicate does not get displayed in the UI.

@chunyuma What's the prognosis for getting this fixed? Do we have a date?

edeutsch commented 1 year ago

@chunyuma What's the prognosis for getting this fixed? Do we have a date?

We have been given a deadline of July 7 to get this fixed and deployed

chunyuma commented 1 year ago

Hi @edeutsch, sorry for the late response. The model training is done and now is doing the pre-computation. But this process is slow. I'm investigating how to improve this speed because based on the current speed, we can't finish this by July 7.

edeutsch commented 1 year ago

Then I think we need to put serious effort into "query time patching of incorrect predicates". I think @saramsey is looking into this?

chunyuma commented 1 year ago

Hi @edeutsch, I have a new update for the new model rebuild. Now, I can switch to CPU training and use much more processes (i.e., 50 instead of 16 cores). But I still need around 200 hours. So hopefully, we can make it done before July 7. Sorry for the late deployment.

chunyuma commented 1 year ago

Hi @edeutsch, the new model has been rebuilt. And its new database with 80% completeness has been uploaded to the server. This issue should be fixed now.

edeutsch commented 1 year ago

Hi @chunyuma I have deployed the latest master to all the usual places. Should this have deployed your fix? Did the new database get distributed to all the right places? Do you suppose that it automatically gets copied to ITRB CI. I am hazy on how this new database gets deployed.

edeutsch commented 1 year ago

@chunyuma I just ran this query and I get 0 results and lots of warnings. Is this because this disease is in the missing 20%? Or some other problem?

{
  "edges": {
    "t_edge": {
      "attribute_constraints": [],
      "knowledge_type": "inferred",
      "object": "on",
      "predicates": [
        "biolink:treats"
      ],
      "qualifier_constraints": [],
      "subject": "sn"
    }
  },
  "nodes": {
    "on": {
      "categories": [
        "biolink:Disease"
      ],
      "constraints": [],
      "ids": [
        "MONDO:0020066"
      ],
      "is_set": false
    },
    "sn": {
      "categories": [
        "biolink:ChemicalEntity"
      ],
      "constraints": [],
      "is_set": false
    }
  }
}
chunyuma commented 1 year ago

@edeutsch, there are some diseases missing in the new xDTD database. I will try to recover them.

saramsey commented 1 year ago

Hi @chunyuma what's the status of this issue?

chunyuma commented 1 year ago

Hi @saramsey, since the ExplainableDTD_v1.0_KG2.8.3_refreshedTo_KG2.8.4.db, the refreshed xDTD model for KG2.8.4 based on KG2.8.3 (using biolink version 3.1.2), has been integrated into the KG2.8.4 branch. When we decide to deploy the KG2.8.4c recently and merge the KG2.8.4 branch to master, this issue can be resolved.

saramsey commented 1 year ago

OK, since we have merged KG2.8.4 to master and since we have deployed KG2.8.4c, I think we should close this out.

saramsey commented 1 year ago

I ran this query

{
  "edges": {
    "N1": {
      "attribute_constraints": [],
      "object": "sn",
      "predicates": [
        "biolink:has_normalized_google_distance_with"
      ],
      "qualifier_constraints": [],
      "subject": "on"
    },
    "creative_DTD_qedge_0": {
      "attribute_constraints": [],
      "exclude": false,
      "object": "creative_DTD_qnode_0",
      "option_group_id": "creative_DTD_option_group_0",
      "qualifier_constraints": [],
      "subject": "sn"
    },
    "creative_DTD_qedge_1": {
      "attribute_constraints": [],
      "exclude": false,
      "object": "creative_DTD_qnode_1",
      "option_group_id": "creative_DTD_option_group_0",
      "qualifier_constraints": [],
      "subject": "creative_DTD_qnode_0"
    },
    "creative_DTD_qedge_2": {
      "attribute_constraints": [],
      "exclude": false,
      "object": "on",
      "option_group_id": "creative_DTD_option_group_0",
      "qualifier_constraints": [],
      "subject": "creative_DTD_qnode_1"
    },
    "t_edge": {
      "attribute_constraints": [],
      "knowledge_type": "inferred",
      "object": "on",
      "predicates": [
        "biolink:treats"
      ],
      "qualifier_constraints": [],
      "subject": "sn"
    }
  },
  "nodes": {
    "creative_DTD_qnode_0": {
      "constraints": [],
      "is_set": true,
      "option_group_id": "creative_DTD_option_group_0"
    },
    "creative_DTD_qnode_1": {
      "constraints": [],
      "is_set": true,
      "option_group_id": "creative_DTD_option_group_0"
    },
    "on": {
      "categories": [
        "biolink:Disease"
      ],
      "constraints": [],
      "ids": [
        "MONDO:0007972"
      ],
      "is_set": false
    },
    "sn": {
      "categories": [
        "biolink:NamedThing"
      ],
      "constraints": [],
      "is_set": false
    }
  }
}

on arax.test.transltr.io tonight, and I got not no results. Is that good?

saramsey commented 1 year ago

Hoping that @chunyuma can help me understand how to interpret that result. Does that mean this issue is fixed?

chunyuma commented 4 months ago

Sorry for the late response. I think this issue is fixed because it has been updated to KG2.8.4.