Closed dkoslicki closed 4 years ago
Getting started here - when I add BioThings Explorer to requirements.txt
and do pip install -r requirements.txt
, this error appears:
ERROR: biothings-explorer 0.0.1 has requirement networkx==2.3, but you'll have networkx 2.4 which is incompatible.
If I adjust requirements.txt
to specify networkx==2.3
(rather than just networkx
), BTE installs just fine.
Does anyone know if using networkx 2.3 might cause a problem elsewhere in our system? Or should this be fine?
arax.rtx.ai also has 2.4, so it seems we would all be in a position to downgrade. not ideal, but maybe okay. I vaguely recall that we had a few minor issues with this upgrade. I think they deleted a few methods in 2.4 that had long been listed as deprecated. So upgrading to 2.4 can break things if one is still using long-deprecated methods.
Alternatively, it may be a good idea to ask the developers about this requirement since we now know them well. Maybe they can make bioThings compatible with 2.4?
Ah, got it, thanks. Yeah, does seem like something worth asking about. I'll do so in slack!
Alright, Kevin Xin says making BTE compatible with networkx 2.4 is definitely something they would like to do, but probably not before the start of our relay meeting on Monday. He says:
I think for the short term, the easiest way is for you to set up a simple local API on your server that runs BTE in its own environment. And your reasoner tool will be querying that API directly.
hmm, I would think that the easiest way for us would be to downgrade to 2.3. I don't think it would break anything. I would try that first. Our code base should be compatible with either 2.3 or 2.4.
Note: Just FYI, networkx was (iirc) only used for the old (now depreciated) system. I don’t think there is any problem with downgrading to 2.3 so BTE can be integrated. I’m pretty sure I was the only one using networkx and just installed whatever was the latest version at the time (and no longer use it)
Ok cool, thanks - I'll plan on going the downgrading route. :)
Ok, so this is working at a basic level - some notes:
add_qnode
works to allow us to add curies that are in BTE but not KG1/2? Not sure how this will work eventually, but for now it's not too limiting as it is, since KG2 is quite largeuse_synonyms
parameter doesn't do anything for BTE queries yet (might need to use SRI's node normalizer instead of or in addition to KGNodeIndex.get_equivalent_curies() to get synonyms, since not all valid BTE curies are in KG2? or perhaps another tool/method...)enforce_directionality
parameter doesn't do anything for BTE queries yet (I think BTE may always enforce edge direction? Need to look into that)Just for the record as this was on slack from Amy:
Hmm - so when I go to https://arax.rtx.ai/beta/ and run this dsl testing BTE:
add_qnode(id=n00, curie=NCBIGene:1017) add_qnode(id=n01, type=chemical_substance, is_set=True) add_qedge(id=e00, source_id=n01, target_id=n00, type=targetedBy) expand(edge_id=e00, kp=BTE)
2020-04-27 14:50:50.312990 ERROR: Encountered a problem while using BioThings Explorer. Traceback (most recent call last): File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/swagger_server/../../../../ARAX/ARAXQuery/Expand/bte_querier.py", line 38, in answer_one_hop_query values=self.__get_curie_local_id(input_node_curie)) File "/mnt/data/python/Python-3.7.3/lib/python3.7/site-packages/biothings_explorer/user_query_dispatcher.py", line 107, in __init__ self.input_cls)]) File "/mnt/data/python/Python-3.7.3/lib/python3.7/site-packages/biothings_explorer/id_resolver.py", line 88, in resolve_ids dotfield=True) File "/mnt/data/python/Python-3.7.3/lib/python3.7/site-packages/biothings_explorer/apicall.py", line 179, in call_apis loop = asyncio.get_event_loop() File "/mnt/data/python/Python-3.7.3/lib/python3.7/asyncio/events.py", line 644, in get_event_loop % threading.current_thread().name) RuntimeError: There is no current event loop in thread 'Thread-53'.
Thanks for posting that, David - also for the record:
I asked the BTE team about this error (which seems to only occur in a multi-threaded environment), and Kevin Xin said he doesn't immediately have an answer, but that he'll investigate and get back to us within a day or two.
More "for the record" information from Kevin Xin:
Hi David,, we’re working on a major release for BTE. It has some major changes: 1) it integrates almost all endpoints provided by RENCI automat which you see in Chunlei’s presentation (e.g. scigraph, hmdb, pharos, etc). BTW, we keep a local copy of SmartAPI registry within BTE package 2) another major change is that it will be fully comply with the biolink model (e.g. the ID Prefix, semantic type, predicate will all come from the biolink.yaml file). I hope this will be easier for your team to integrate BTE. 3) There will be features allowing you to include/exclude certain APIs during the query.
These changes are not yet reflected in BTE master branch yet, but it’s fully tested and functionally, and I’m just working on a little more documentation regarding the changes. If you would like to give a try, you could install by
pip install git+https://github.com/biothings/biothings_explorer@smartapi-comply#egg=biothings_explorer
. And you could glance at the test cases (https://github.com/biothings/biothings_explorer/tree/smartapi-comply/tests/test_apis) to see how to play with the service
BTE is now compatible with networkx 2.4 thanks to Kevin Xin - I tested it out locally and updated requirements.txt
as appropriate.
great, thanks! Any word on the multi-threading issue?
I relayed your link to Kevin this morning (he says thanks!) and he's going to try it out.
I recently learned a bit more about BTE's output and how it handles synonyms: it turns out that BTE identifies entities by name in its output (as long as the entity has a name), so most of the nodes in BTE's reasoner std output for a given query will look something like this:
{
"id":"name:DAUNORUBICIN",
"name":"DAUNORUBICIN",
"type":"ChemicalSubstance",
"equivalent_identifiers":"defaultdict(<class" "set" ">",
{
"CHEMBL.COMPOUND":[
"CHEMBL178",
"CHEMBL1200475",
"CHEMBL1563"
],
"DRUGBANK":[
"DB00694"
],
"PUBCHEM":[
"30323"
],
"CHEBI":[
"CHEBI:41977"
],
"UMLS":[
"C0011015"
],
"MESH":[
"D003630"
],
"UNII":[
"ZS7284E0ZP",
"5L84T2Z6NP"
],
"name":[
"DAUNORUBICIN"
]
} ")"
}
note that the node's id
is name:DAUNORUBICIN
... which isn't really a valid curie (like, that other KPs/components would recognize). While I see how this handles synonyms nicely in one sense, this becomes an issue for multi-hop expands - no KPs (including BTE itself) can accept name:
nodes as input.
So what I'm trying to figure out is what makes the most sense as to how to incorporate these results into our Message.KnowledgeGraph. What should loading the above returned node into our system look like?
This problem is very similar to the synonym handling we already do for KG1/KG2, although it's a bit different in that this concerns how to handle synonyms for 'intermediate'/'end' query nodes, rather than the 'start' query node... Options analogous to our synonym handling methods in #707:
name:
one) as separate nodes in our answer (like add_all
)map_back
.. except in this case, no input curie was provided to map back to... we would have to choose one)name:DAUNORUBICIN
node is their version of an uber node)Welcome any thoughts...
tricky problem. I suppose we just need to have a little translator method for BTE (and probably others as well). As you suggest, all three options are plausible and depends on the option that expand() was called with. Option 2 is the default, so I would suggest starting off with option 2. When creating the Node object, find the best curie by having a static ordered list of curie prefixes we like, something like ordered_preferred_curie_prefix_list = [ "CHEMBL.COMPOUND", "CHEBI", "DRUGBANK", etc. ] and find the first one in the list here in the equivalent_identifiers map and assign that to the id
. Could normalize the type
using KGNodeIndex I suppose or a manual mapping. All the other stuff can be dropped for now, although could also fit in node_attributes.
We might eventually want to support all three options. I think option 3 potentially very attractive if we can figure out a scheme to support that in our reasoning. But not ready for that today.
@edeutsch - so Kevin Xin says he addressed the multithread-related issue with BTE - he says:
The method I use is very similar to the article with Eric recommended. You can pass your event loop object to the SingleEdgeQueryDispatcher as an optional parameter.
And he provided this example for how I can implement it: https://github.com/biothings/biothings_explorer/blob/master/tests/test_multithreading.py#L16
But I'm not totally sure how implementing this in the Expand code would work - wondering if you have insight? Is this suggesting I would need to create an asyncio.new_event_loop()
from Expand and pass it into SingleEdgeQueryDispatcher
? (A bit confused since I didn't think we use asyncio anywhere.)
(The code in Expand that runs SingleEdgeQueryDispatcher
is here, fyi.)
I'm in over my head here so I don't know the answer. But I think your guess may be right, so maybe you can experiment to see if you find something that works? I think I might start with get_event_loop(), but if that doesn't work then try new_event_loop(), yes.
The first thing I would try is:
import asyncio
loop = asyncio.get_event_loop()
seqd = SingleEdgeQueryDispatcher(input_cls=input_qnode.type,
output_cls=output_qnode.type,
pred=qedge.type,
input_id=self.__get_curie_prefix(curie),
loop=loop,
values=self.__get_curie_local_id(curie))
but I'm just guessing..
Ok, thanks! I'll start there and try to figure out something that works..
Documenting for the record: Using asyncio.get_event_loop()
didn't work - throws this error on /test:
Encountered a problem while using BioThings Explorer. Traceback (most recent call last): File "/mnt/data/orangeboard/test/RTX/code/UI/OpenAPI/python-flask-server/swagger_server/../../../../ARAX/ARAXQuery/Expand/bte_querier.py", line 38, in answer_one_hop_query loop = asyncio.get_event_loop() File "/mnt/data/python/Python-3.7.3/lib/python3.7/asyncio/events.py", line 644, in get_event_loop % threading.current_thread().name) RuntimeError: There is no current event loop in thread 'Thread-8'.
So I'm trying out asyncio.new_event_loop()
. (Both of these work when running ARAX from the command line, so it's a bit difficult to test without trying out the code on /test or /beta.)
Alright, I think this issue can finally be considered complete. :) The multithread-related problem has been resolved, so BTE Expands can now work via the UI/API.
And I've also made some notable improvements to processing of what BTE returns (to do a form of 'map_back' synonym handling like described here), error catching, etc. Though these changes are currently in the expander
branch and can't be merged to master
until #720 is done and merged. (I needed changes made in issue720
, so pulled those changes into the expander
branch.)
I'll plan to create separate issues for BTE enhancements going forward (like the ability to do curie--curie queries #759).
But will wait to close this issue until it can be tested on production/test/beta.
Tested on /beta - all seems to be working well!
Example test query: https://arax.rtx.ai/beta/?m=2282
Implement the BioThingsExplorer as a KP in
expand
as per the ARAX/BTE meeting on 4/22/2020 and the resulting notebook. In these slack threads.