NCATSTranslator / Relay

Autonomous relay system for NCATS Biomedical Data Translator
MIT License
5 stars 22 forks source link

Run All Rare Disease Identifiers for Repurposing Use Case #82

Closed sstemann closed 3 years ago

sstemann commented 3 years ago

Use Case: Can we repurpose a drug for any indication?

Rosinaweber commented 3 years ago

Hi Sarah: Is there a consensus to focus on rare diseases? As a suggestion, wouldn't it be easier if we first assessed what diseases existing KPs cover and then focus on those? This way, we would not have to allocate any resources to add any new contents but simply show the capabilities of the Translator based on data and knowledge that has already been captured. Rosina-

southalln commented 3 years ago

Just seeing the comment now, @Rosinaweber. There are two things going on here --- 1) we want to run a lot of queries simultaneously through the ARS, and this is a simple enough way to test that out. 2) Each team should have a firm sense of what aspects of translational research they can best address. Several teams have already invested in work around drug repurposing (e.g. https://www.biorxiv.org/content/10.1101/765305v1) and understanding target development landscapes for diseases, especially genetic rare diseases. Both are excellent translational research use-cases that should benefit from the different resources made available by the current teams. We can't afford to narrowly focus the entire project on a single disease, so the fact that rare diseases are so diverse is a good test for Translator. We do expect to identify strengths and weaknesses while doing so, and can adjust as needed. Hope that helps.

cartmanbeck commented 3 years ago

I agree with Noel's assessment completely.

southalln commented 3 years ago

Here's one of the queries we'll try out:


{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "MONDO:0018150",
          "category": "biolink:Disease"
        },
        "n01": {
          "category": "biolink:ChemicalSubstance"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      }
    }
  }
}
caodac commented 3 years ago

Here's a test script to run this query for all monogenic diseases: https://gist.github.com/caodac/8c9ba3bcced4050df52bcc8762e2a09f

Rosinaweber commented 3 years ago

Just seeing the comment now, @Rosinaweber. There are two things going on here --- 1) we want to run a lot of queries simultaneously through the ARS, and this is a simple enough way to test that out. 2) Each team should have a firm sense of what aspects of translational research they can best address. Several teams have already invested in work around drug repurposing (e.g. https://www.biorxiv.org/content/10.1101/765305v1) and understanding target development landscapes for diseases, especially genetic rare diseases. Both are excellent translational research use-cases that should benefit from the different resources made available by the current teams. We can't afford to narrowly focus the entire project on a single disease, so the fact that rare diseases are so diverse is a good test for Translator. We do expect to identify strengths and weaknesses while doing so, and can adjust as needed. Hope that helps.

Dear Noel @southalln: Thanks for your reply. I may have not been clear. I did not mean to narrow the focus of the tests. What I am suggesting is an assessment of what associations are available right now-- as in what the actual scope of the KPs is. It seems that your perception is that the majority is around rare diseases, so we are thinking along the same lines. I just wanted to ask how certain we are about the current scope of the KPs. My concern is that we can focus this year to improve what we have rather than increase scope. This was the underlying idea of my comment.

mellybelly commented 3 years ago

This is actually more challenging that it might seem, please see our article on enumerating the rare diseases here: https://www.nature.com/articles/d41573-019-00180-y You may wish to use the Monarch/SRI KG KP to help define the corpus of diseases

mellybelly commented 3 years ago

here is the TRAPI for the SRI KG: https://smart-api.info/ui/b0c489ea3a4d5aacfd833616d07a037a

caodac commented 3 years ago

Hi @mellybelly, would it be possible to expose the underlying neo4j endpoint to the TRAPI SRI KG? It'd be great to also have a small tutorial via neo4j guide to help understand the structure of the data https://neo4j.com/developer/guide-create-neo4j-browser-guide/.

mellybelly commented 3 years ago

@deepakunni3 or @kshefchek can you help with this

deepakunni3 commented 3 years ago

@caodac The underlying Neo4j instance is available at http://scigraph.ncats.io/browser/ The KG is basically the Monarch Integrated Knowledge Graph built using the Biolink Model.

Regarding the Neo4j guide/tutorial: CC'ing @kshefchek

diatomsRcool commented 3 years ago

@balhoff Any comments?

caodac commented 3 years ago

Great, thanks @deepakunni3!

andrewsu commented 3 years ago

Were there any results or feedback to report here? I think automating this analysis would be really informative. And if we restrict to the subset of rare diseases for which some treatments are known/annotated, I think this would be a great component of a regression testing pipeline.

southalln commented 3 years ago

@andrewsu Initial results were bugs / problems with ARS handling of actor timeouts and Celery performance supporting the ARS message queue. The query above is by no means a proper implementation of the intended goal: repurposing for all rare diseases. So, we want to close this specific issue testing message queue performance, but the topic requires/deserves further follow-up somewhere else ... not sure where - but we could open a new issue in ARS?

southalln commented 3 years ago

Moved to https://github.com/NCATSTranslator/Relay/discussions/123