A couple reasons I really hesitate to implement this:
The question makes sense. But the resources used to answer this question doesn't make sense at all. The question asks for drug repurposing. But the results returned from Exposure Provider are drug exposures, e.g .OZone. This doesn't make sense.
The bigger problem is that the Connections Hypothesis Provider and Exposure Provider can only answer very specific problems. The first only works with Breast Cancer, the second only works with Asthma. However, BTE, assumes that all APIs integrated are general purpose APIs. So if we integrate Connections Hypothesis Provider, whenever we receive a Disease -> Gene query, we will send to them (which is a waste of time, since they can only do Breast cancer). Besides, their API takes >10s to answer one single query, this will also affect BTE performance a lot. If we want to integrate APIs like Connections Hypothesis Provider, we need to first figure out a way to express that this API only works in specific scenarios within SmartAPI, so BTE will exclude this resource if not.
BTE have already proven it can handle multi-hop queries very well. And there're a lot of important things on BTE to-do-list. To me, these two resources are not mature at all at this moment. It felt to me not worth it (at least at this moment) to integrate these two to get the workflow working.
Great analysis, thanks @kevinxin90. A few follow up thoughts:
I agree then with your analysis that we should not yet add the APIs from those two providers for the performance reasons you mentioned
This is a low priority, but could create a class of APIs that are by default not queried unless explicitly included (following syntax in #85)
I think the notebook here which preserves the metapath but does not specify specific APIs is still useful for the clinical WG to look at. Once I rerun it, I think one of us should post it to that slack channel...
This issue is to track how BTE can be used to answer the breast cancer drug repurposing use case developed by the clinical WG. An overview presentation of the use case (including results from Winter 2021 relay meeting) is here: https://docs.google.com/presentation/d/10Z-qC4We63WUfalfYfJqbCkLYbJDjc3t2rPG4AUG5k8/edit. The TRAPI query is this:
The envisioned query path (from the slide deck above) looks like this:
I created a first pass notebook here: https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/Disease%20-%20Gene%20-%20Disease%20-%20Chem%202021-02-03%20relay%20(Breast%20cancer).ipynb. This notebook translates the TRAPI query into
FindConnection
syntax (minus predicates). Ultimately we'll want to run the TRAPI query directly either through BioThings Explorer TRAPI or after implementing #165.This issue will track how BTE does in executing the envisioned query plan. More details to come...