RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Incorporate BioThingsExplorer into expand as an alternate KP #722

Closed dkoslicki closed 4 years ago

dkoslicki commented 4 years ago

Implement the BioThingsExplorer as a KP in expand as per the ARAX/BTE meeting on 4/22/2020 and the resulting notebook. In these slack threads.

amykglen commented 4 years ago

Getting started here - when I add BioThings Explorer to requirements.txt and do pip install -r requirements.txt, this error appears:

ERROR: biothings-explorer 0.0.1 has requirement networkx==2.3, but you'll have networkx 2.4 which is incompatible.

If I adjust requirements.txt to specify networkx==2.3 (rather than just networkx), BTE installs just fine.

Does anyone know if using networkx 2.3 might cause a problem elsewhere in our system? Or should this be fine?

edeutsch commented 4 years ago

arax.rtx.ai also has 2.4, so it seems we would all be in a position to downgrade. not ideal, but maybe okay. I vaguely recall that we had a few minor issues with this upgrade. I think they deleted a few methods in 2.4 that had long been listed as deprecated. So upgrading to 2.4 can break things if one is still using long-deprecated methods.

Alternatively, it may be a good idea to ask the developers about this requirement since we now know them well. Maybe they can make bioThings compatible with 2.4?

amykglen commented 4 years ago

Ah, got it, thanks. Yeah, does seem like something worth asking about. I'll do so in slack!

amykglen commented 4 years ago

Alright, Kevin Xin says making BTE compatible with networkx 2.4 is definitely something they would like to do, but probably not before the start of our relay meeting on Monday. He says:

I think for the short term, the easiest way is for you to set up a simple local API on your server that runs BTE in its own environment. And your reasoner tool will be querying that API directly.

edeutsch commented 4 years ago

hmm, I would think that the easiest way for us would be to downgrade to 2.3. I don't think it would break anything. I would try that first. Our code base should be compatible with either 2.3 or 2.4.

dkoslicki commented 4 years ago

Note: Just FYI, networkx was (iirc) only used for the old (now depreciated) system. I don’t think there is any problem with downgrading to 2.3 so BTE can be integrated. I’m pretty sure I was the only one using networkx and just installed whatever was the latest version at the time (and no longer use it)

amykglen commented 4 years ago

Ok cool, thanks - I'll plan on going the downgrading route. :)

amykglen commented 4 years ago

Ok, so this is working at a basic level - some notes:

dkoslicki commented 4 years ago

Just for the record as this was on slack from Amy:

Hmm - so when I go to https://arax.rtx.ai/beta/ and run this dsl testing BTE:

add_qnode(id=n00, curie=NCBIGene:1017)
add_qnode(id=n01, type=chemical_substance, is_set=True)
add_qedge(id=e00, source_id=n01, target_id=n00, type=targetedBy)
expand(edge_id=e00, kp=BTE)
2020-04-27 14:50:50.312990 ERROR:    Encountered a problem while using BioThings Explorer.
Traceback (most recent call last): File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/swagger_server/../../../../ARAX/ARAXQuery/Expand/bte_querier.py", line 38, in answer_one_hop_query values=self.__get_curie_local_id(input_node_curie)) File "/mnt/data/python/Python-3.7.3/lib/python3.7/site-packages/biothings_explorer/user_query_dispatcher.py", line 107, in __init__ self.input_cls)]) 
File "/mnt/data/python/Python-3.7.3/lib/python3.7/site-packages/biothings_explorer/id_resolver.py", line 88, in resolve_ids dotfield=True) 
File "/mnt/data/python/Python-3.7.3/lib/python3.7/site-packages/biothings_explorer/apicall.py", line 179, in call_apis loop = asyncio.get_event_loop() 
File "/mnt/data/python/Python-3.7.3/lib/python3.7/asyncio/events.py", line 644, in get_event_loop % threading.current_thread().name) 
RuntimeError: There is no current event loop in thread 'Thread-53'.
amykglen commented 4 years ago

Thanks for posting that, David - also for the record:

I asked the BTE team about this error (which seems to only occur in a multi-threaded environment), and Kevin Xin said he doesn't immediately have an answer, but that he'll investigate and get back to us within a day or two.

dkoslicki commented 4 years ago

More "for the record" information from Kevin Xin:

Hi David,, we’re working on a major release for BTE. It has some major changes: 1) it integrates almost all endpoints provided by RENCI automat which you see in Chunlei’s presentation (e.g. scigraph, hmdb, pharos, etc). BTW, we keep a local copy of SmartAPI registry within BTE package 2) another major change is that it will be fully comply with the biolink model (e.g. the ID Prefix, semantic type, predicate will all come from the biolink.yaml file). I hope this will be easier for your team to integrate BTE. 3) There will be features allowing you to include/exclude certain APIs during the query.

These changes are not yet reflected in BTE master branch yet, but it’s fully tested and functionally, and I’m just working on a little more documentation regarding the changes. If you would like to give a try, you could install by pip install git+https://github.com/biothings/biothings_explorer@smartapi-comply#egg=biothings_explorer . And you could glance at the test cases (https://github.com/biothings/biothings_explorer/tree/smartapi-comply/tests/test_apis) to see how to play with the service

amykglen commented 4 years ago

BTE is now compatible with networkx 2.4 thanks to Kevin Xin - I tested it out locally and updated requirements.txt as appropriate.

edeutsch commented 4 years ago

great, thanks! Any word on the multi-threading issue?

amykglen commented 4 years ago

I relayed your link to Kevin this morning (he says thanks!) and he's going to try it out.

amykglen commented 4 years ago

I recently learned a bit more about BTE's output and how it handles synonyms: it turns out that BTE identifies entities by name in its output (as long as the entity has a name), so most of the nodes in BTE's reasoner std output for a given query will look something like this:

{
   "id":"name:DAUNORUBICIN",
   "name":"DAUNORUBICIN",
   "type":"ChemicalSubstance",
   "equivalent_identifiers":"defaultdict(<class"   "set"   ">",
{
      "CHEMBL.COMPOUND":[
         "CHEMBL178",
         "CHEMBL1200475",
         "CHEMBL1563"
      ],
      "DRUGBANK":[
         "DB00694"
      ],
      "PUBCHEM":[
         "30323"
      ],
      "CHEBI":[
         "CHEBI:41977"
      ],
      "UMLS":[
         "C0011015"
      ],
      "MESH":[
         "D003630"
      ],
      "UNII":[
         "ZS7284E0ZP",
         "5L84T2Z6NP"
      ],
      "name":[
         "DAUNORUBICIN"
      ]
   }   ")"
}

note that the node's id is name:DAUNORUBICIN... which isn't really a valid curie (like, that other KPs/components would recognize). While I see how this handles synonyms nicely in one sense, this becomes an issue for multi-hop expands - no KPs (including BTE itself) can accept name: nodes as input.

So what I'm trying to figure out is what makes the most sense as to how to incorporate these results into our Message.KnowledgeGraph. What should loading the above returned node into our system look like?

This problem is very similar to the synonym handling we already do for KG1/KG2, although it's a bit different in that this concerns how to handle synonyms for 'intermediate'/'end' query nodes, rather than the 'start' query node... Options analogous to our synonym handling methods in #707:

  1. Add all of the equivalent identifiers (excluding the name: one) as separate nodes in our answer (like add_all)
  2. Pick one of those equivalent identifiers to use (like, maybe we prefer DRUGBANK, so we create a node with that curie, ignore the rest) (like map_back.. except in this case, no input curie was provided to map back to... we would have to choose one)
  3. Create some sort of 'uber' node (I suppose BTE's name:DAUNORUBICIN node is their version of an uber node)

Welcome any thoughts...

edeutsch commented 4 years ago

tricky problem. I suppose we just need to have a little translator method for BTE (and probably others as well). As you suggest, all three options are plausible and depends on the option that expand() was called with. Option 2 is the default, so I would suggest starting off with option 2. When creating the Node object, find the best curie by having a static ordered list of curie prefixes we like, something like ordered_preferred_curie_prefix_list = [ "CHEMBL.COMPOUND", "CHEBI", "DRUGBANK", etc. ] and find the first one in the list here in the equivalent_identifiers map and assign that to the id. Could normalize the type using KGNodeIndex I suppose or a manual mapping. All the other stuff can be dropped for now, although could also fit in node_attributes.

We might eventually want to support all three options. I think option 3 potentially very attractive if we can figure out a scheme to support that in our reasoning. But not ready for that today.

amykglen commented 4 years ago

@edeutsch - so Kevin Xin says he addressed the multithread-related issue with BTE - he says:

The method I use is very similar to the article with Eric recommended.  You can pass your event loop object to the SingleEdgeQueryDispatcher as an optional parameter.

And he provided this example for how I can implement it: https://github.com/biothings/biothings_explorer/blob/master/tests/test_multithreading.py#L16

But I'm not totally sure how implementing this in the Expand code would work - wondering if you have insight? Is this suggesting I would need to create an asyncio.new_event_loop() from Expand and pass it into SingleEdgeQueryDispatcher? (A bit confused since I didn't think we use asyncio anywhere.)

(The code in Expand that runs SingleEdgeQueryDispatcher is here, fyi.)

edeutsch commented 4 years ago

I'm in over my head here so I don't know the answer. But I think your guess may be right, so maybe you can experiment to see if you find something that works? I think I might start with get_event_loop(), but if that doesn't work then try new_event_loop(), yes.

The first thing I would try is:

import asyncio
                    loop = asyncio.get_event_loop()
                    seqd = SingleEdgeQueryDispatcher(input_cls=input_qnode.type,
                                                     output_cls=output_qnode.type,
                                                     pred=qedge.type,
                                                     input_id=self.__get_curie_prefix(curie),
                                                     loop=loop,
                                                     values=self.__get_curie_local_id(curie))

but I'm just guessing..

amykglen commented 4 years ago

Ok, thanks! I'll start there and try to figure out something that works..

amykglen commented 4 years ago

Documenting for the record: Using asyncio.get_event_loop() didn't work - throws this error on /test:

Encountered a problem while using BioThings Explorer. Traceback (most recent call last): File "/mnt/data/orangeboard/test/RTX/code/UI/OpenAPI/python-flask-server/swagger_server/../../../../ARAX/ARAXQuery/Expand/bte_querier.py", line 38, in answer_one_hop_query loop = asyncio.get_event_loop() File "/mnt/data/python/Python-3.7.3/lib/python3.7/asyncio/events.py", line 644, in get_event_loop % threading.current_thread().name) RuntimeError: There is no current event loop in thread 'Thread-8'.

So I'm trying out asyncio.new_event_loop(). (Both of these work when running ARAX from the command line, so it's a bit difficult to test without trying out the code on /test or /beta.)

amykglen commented 4 years ago

Alright, I think this issue can finally be considered complete. :) The multithread-related problem has been resolved, so BTE Expands can now work via the UI/API.

And I've also made some notable improvements to processing of what BTE returns (to do a form of 'map_back' synonym handling like described here), error catching, etc. Though these changes are currently in the expander branch and can't be merged to master until #720 is done and merged. (I needed changes made in issue720, so pulled those changes into the expander branch.)

I'll plan to create separate issues for BTE enhancements going forward (like the ability to do curie--curie queries #759).

But will wait to close this issue until it can be tested on production/test/beta.

amykglen commented 4 years ago

Tested on /beta - all seems to be working well!

Example test query: https://arax.rtx.ai/beta/?m=2282