biothings / biothings_explorer_archived

BioThings Explorer: a schema-based client for API interoperability
Apache License 2.0
14 stars 14 forks source link

Breast cancer repurposing use case from clinical WG #170

Open andrewsu opened 3 years ago

andrewsu commented 3 years ago

This issue is to track how BTE can be used to answer the breast cancer drug repurposing use case developed by the clinical WG. An overview presentation of the use case (including results from Winter 2021 relay meeting) is here: https://docs.google.com/presentation/d/10Z-qC4We63WUfalfYfJqbCkLYbJDjc3t2rPG4AUG5k8/edit. The TRAPI query is this:

{
"message": {
  "query_graph": {
    "edges": {
      "e0": {
        "predicate": "biolink:gene_associated_with_condition",
        "subject": "n0",
        "object": "n1"
      },
      "e1": {
        "predicate":     "biolink:gene_has_variant_that_contributes_to_disease_association",
        "subject": "n0",
        "object": "n2"
      },
      "e2": {
        "subject": "n2",
        "object": "n3",
        "predicate": "biolink:correlated_with"
      }
    },
    "nodes": {
      "n0": {
        "category": "biolink:Gene"
      },
      "n1": {
        "category": "biolink:Disease",
        "id": "MONDO:0007254"
      },
      "n2": {
        "category": "biolink:Disease"
      },
      "n3": {
        "category": "biolink:ChemicalSubstance"
      }
    }
  }
}

The envisioned query path (from the slide deck above) looks like this:

image

I created a first pass notebook here: https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/Disease%20-%20Gene%20-%20Disease%20-%20Chem%202021-02-03%20relay%20(Breast%20cancer).ipynb. This notebook translates the TRAPI query into FindConnection syntax (minus predicates). Ultimately we'll want to run the TRAPI query directly either through BioThings Explorer TRAPI or after implementing #165.

This issue will track how BTE does in executing the envisioned query plan. More details to come...

kevinxin90 commented 3 years ago

A couple reasons I really hesitate to implement this:

  1. The question makes sense. But the resources used to answer this question doesn't make sense at all. The question asks for drug repurposing. But the results returned from Exposure Provider are drug exposures, e.g .OZone. This doesn't make sense.
  2. The bigger problem is that the Connections Hypothesis Provider and Exposure Provider can only answer very specific problems. The first only works with Breast Cancer, the second only works with Asthma. However, BTE, assumes that all APIs integrated are general purpose APIs. So if we integrate Connections Hypothesis Provider, whenever we receive a Disease -> Gene query, we will send to them (which is a waste of time, since they can only do Breast cancer). Besides, their API takes >10s to answer one single query, this will also affect BTE performance a lot. If we want to integrate APIs like Connections Hypothesis Provider, we need to first figure out a way to express that this API only works in specific scenarios within SmartAPI, so BTE will exclude this resource if not.
  3. BTE have already proven it can handle multi-hop queries very well. And there're a lot of important things on BTE to-do-list. To me, these two resources are not mature at all at this moment. It felt to me not worth it (at least at this moment) to integrate these two to get the workflow working.
andrewsu commented 3 years ago

Great analysis, thanks @kevinxin90. A few follow up thoughts: