Closed amykglen closed 2 years ago
While additions and alterations to smartapi.py may continue to be made as we identify a more clear use case, at the moment I have made some changes which return more information about trapi endpoints for more flexibility and convenience. Specifically, smartapi.py can be used to return all endpoints, all KPs (a subset of all endpoints), all dev KPs or all prod KPs (which are subsets of all KPs). These could be used for distinguishing endpoints for various purposes elsewhere in ARAX. Additionally, the functions which return these endpoints return more information, including the list of servers of that endpoint and the infores name. I will push these changes to a branch, Issue1466 shortly.
After receiving a formal standard set by Translator for maturity levels in the smartAPI registry I continued to implement new functionality in smartapi.py on branch issue1466.
Because not every TRAPI endpoint is standard in their smartapi registry entries, this code is only imperfectly able to return a list of endpoints and its relevant data. For instance, some KPs do not have infores names listed, some KPs have multiple registry entries, and some KPs have maturity levels not compliant with the four maturity levels specified by Translator.
However, this module does well enough at the moment, especially for manual usage. It has been implemented with a CLI for this purpose. Users may get a list of all trapi endpoints, just endpoints which support workflows, or just endpoints which are labeled as KPs. Additionally, these lists can be filtered via the use of a whitelist and blacklist. KPs can be further filtered by specifying a required maturity level with strict or non-strict settings.
We may implement automatic usage of this in some capacity in expand, such as a updating a cached list of KPs with certain parameters daily, but as of yet we haven't decided.
Some example CLI commands:
python3 smartapi.py get_trapi_endpoints
python3 smartapi.py get_trapi_endpoints -p
python3 smartapi.py get_operations_endpoints -p
python3 smartapi.py get_kps -p
python3 smartapi.py get_kps -p -m testing
python3 smartapi.py get_kps -p -m development
python3 smartapi.py get_kps -p -m development -f
python3 smartapi.py get_kps -p -m testing -f -i production testing staging development
python3 smartapi.py get_kps -p --whitelist infores:icees-dili infores:icees-asthma infores:icees-pcd
python3 smartapi.py get_kps -p -m production --blacklist infores:rtx-kg2
There should be sufficient usage documentation in the smartapi.py module, but as an extra measure I can describe the flags here. I believe @edeutsch and @isbluis intend to take a look at the module and provide feedback.
awesome, thanks @rcpeene! this view is great:
(arax3.9) amys-mbp:Expand amyglen$ python smartapi.py get_kps -p
infores name component maturities n_entries
infores:automat-biolink KP production 1
infores:automat-chem-norm KP production 1
infores:automat-cord19 KP production 1
infores:automat-covidkop KP production 1
infores:automat-ctd KP production 1
infores:automat-drug-central KP production 1
infores:automat-foodb KP production 1
infores:automat-gtex KP production 1
infores:automat-gtopdb KP production 1
infores:automat-gwas-catalog KP production 1
infores:automat-hetio KP production 1
infores:automat-hgnc KP production 1
infores:automat-hmdb KP production 1
infores:automat-human-goa KP production 1
infores:automat-intact KP production 1
infores:automat-mole-pro-fda KP production 1
infores:automat-ontology-hierarchy KP production 1
infores:automat-panther KP production 1
infores:automat-pharos KP production 1
infores:automat-robokop KP production 1
infores:automat-text-mining-provider KP production 1
infores:automat-uberongraph KP production 1
infores:automat-viral-proteome KP production 1
infores:biothings-explorer KP staging, production, development 1
infores:cam-kp KP production 2
infores:cohd KP production, development 1
infores:connections-hypothesis KP dev, production, staging 1
infores:genetics-data-provider KP staging, development, production 3
infores:icees-asthma KP production, development 2
infores:icees-dili KP production, development 2
infores:icees-pcd KP production, development 2
infores:knowledge-collaboratory KP production 1
infores:molepro KP production, development, test 1
infores:monarchinitiative KP production 1
infores:openpredict KP production 1
infores:rtx-kg2 KP production, development 2
infores:spoke KP production 1
infores:sri-ontology KP production 1
infores:text-mining-provider-cooccurrence KP development 1
so n_entries
captures the number of separate SmartAPI registrations that that KP has for their various instances, right? (since technically KPs can either list multiple maturities under one registration or have a separate registration for each maturity)
Yes, n_entries is the number of separate smartapi registry entries that exist with that infores name. This is only used when the -p argument is passed. Otherwise, the output just contains every individual smartapi registry entry without any such collating.
I showed off this system (with Luis's front end) to everyone at the Architecture call and it was well received (some socks came clean off!) One suggestion was made:
suggestion -- can links to the SmartAPI registry entry be added? e.g., https://smart-api.info/registry?q=acca268be3645a24556b81cc15ce0e9a
@rcpeene are those URLs available in the query result from SmartAPI? Can you add those to your JSON output? (don't need it in the -p output)
thanks!
It took a bit of investigation, but I was able to figure out how to fetch the smartapi registry urls for endpoints. Now a new field is returned with the endpoint dict named "smartapi_url".
brill! Looks great: https://arax.ncats.io/devED/api/arax/v1.3/status?authorization=smartapi
Forgot to tag the commit that includes this latest info in the UI: 9ad6f41 Thanks!
dynamically selecting KPs from SmartAPI is high priority per 8/3 AHM - and TRAPI 1.3 ARAX should only use TRAPI 1.3 KPs
we need to add to KPSelector
in Expand to do this dynamic selection
so we don't forget: when we do dynamic KP selection we will also need to address:
RTX/code/ARAX/Documentation/arax_kps.json
- should move to an auto-updating system, probably where we no longer store this file in our repo, but somewhere else)note to @rcpeene / myself: remember to regenerate DSL_Documentation.md
once work on this issue is done (before it's merged into master
)
I believe all the necessary implementation changes have been made in ARAX_expander.py
, expand_utilities.py
, kp_selector.py
, and trapi_querier.py
, and changes have been pushed. Mostly minor refactors and attribute changes to ARAXExpander. We have yet to address the two items that @amykglen enumerated above
DSL_documentation has been regenerated and a few bug fixes have been made on this implementation.
I have merged my changes with Amy's changes to RTXConfiguration.py
. Now the implementation in expand_utilities.py
which uses smartapi.py
also intelligently uses RTXConfiguration
to choose other kps based on our instance's maturity level. Additionally, it chooses the rtx-kg2 endpoint url to use based on the is_itrb_instance
attribute of the config.
great! This is new in master
?
Sorry I should have specified. I pushed these changes to issue1466
According to @amykglen, we still need to make changes to RTXConfiguration.py
such that RTXConfig.rtx_kg2_url
is None unless explicitly set, so it may be used as a sort of override. Currently, my implementation in expand_utilities.py
will use this attribute unless it is None, in which case it will try to choose a kg2 url based on maturity
and is_itrb_instance
.
yes - I'll go into issue1466
and make that tweak. thanks!
ok, I pushed that change to the issue1466
branch
Looks like the tests run smoothly in issue1466
and it is ready to be merged
awesome! let's just do one final merge of master
into issue1466
after today's mini-hackathon (since we'll be merging NewFmt
into master
) and make sure everything still runs ok one final time. then we should be clear to merge it into master
!
hey @edeutsch - could you update the SmartAPI registration for kg2.ci.transltr.io and arax.ci.transltr.io so that they are for TRAPI 1.3 instead of TRAPI 1.2? since they are running master, which is now TRAPI 1.3.
arax.ci.transltr.io
is currently not really working because changes for this issue were merged to master
and there is apparently only one 'staging' TRAPI 1.3 KP registered (COHD).
oh, good catch, there are several SmartAPI deficiencies. I will update that all now.
okay, I think I have updated everything, how does this look:
What do you think?
nice, thanks! that looks good to me!
so arax.ci.transltr.io is now able to select KG2 as a KP successfully, but strangely, this error happens always at this same point in processing, without any message in the log:
what's very odd is that no such error happens when running the same query locally, even when forcing a maturity of 'staging', like CI has. no idea what this could be... wondering if @edeutsch knows anything?
I'm guessing that the live stream of JSON objects (that provides the logging and traffic light data is corrupted somehow. If you can provide an exact repro, maybe @isbluis can easily tell us what the offending JSON object is and we can trace the source?
ah, ok. I see one query plan update message that it might possibly have been unhappy about - just pushed a change to that. I'll give it a few minutes to see if that happens to fix it, otherwise post back here with a repro. thanks!
huh, ok, it's still broken. the way I'm seeing the error is by going to https://arax.ci.transltr.io/ (which is running the current code in master
), then running the standard JSON example acetaminophen query.
So the stream and processing just appears to end there... that's why the UI is confused. Here's a repro.
cat > query_v1.3.json
{
"stream_progress": true,
"message": {
"query_graph": {
"edges": {
"e00": {
"subject": "n00",
"object": "n01",
"predicates": ["biolink:physically_interacts_with"]
}
},
"nodes": {
"n00": {
"ids": ["CHEMBL.COMPOUND:CHEMBL112"]
},
"n01": {
"categories": ["biolink:Protein"]
}
}
}
},
"max_results": 100
}
^D
CONTENT=`cat query_v1.3.json`
ENDPOINT="https://arax.ci.transltr.io/api/arax/v1.3"
curl -s -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d "$CONTENT" $ENDPOINT/query
If you run that, maybe need to fix dialect, you should see this:
...
{"timestamp": "2022-08-19T04:45:44.102565", "level": "INFO", "code": "", "message": "The KPs Expand decided to answer e00 with are: {'infores:rtx-kg2'}"}
{"timestamp": "2022-08-19T04:45:44.102606", "level": "DEBUG", "code": "", "message": "Will use asyncio to run KP queries concurrently"}
{"timestamp": "2022-08-19T04:45:44.103025", "level": "INFO", "code": "", "message": "Expanding qedge e00 using infores:rtx-kg2"}
$
i.e. right after is says it wants to use rtx-kg2, then connection suddenly drops without another word. weird.
The UI is expected a bunch of status objects like that ultimately followed by the final TRAPI message. But there's not TRAPI message and so it act weird. Maybe @isbluis can take the opportunity to make it a little more robust in the face of this, but ultimately it is unexpected output.
ah, that's very useful! I think I might've spotted the problem. looks like something is being added to the query plan log with a key that is None, instead of the KP's infores curie
yep, that seems to have fixed it! arax.ci.transltr.io has finished rebuilding after my commit and now appears to be working.
quite the list of KPs in the query summary table now! https://arax.ci.transltr.io/?r=57135
I see infores:aragorn
and infores:unsecret-agent
are listed in the table, which I thought are only ARAs (not KPs). does the KP selector code currently search for KPs and ARAs, @rcpeene?
this also seems odd - ARAX apparently tries to query itself? though it says there isn't a TRAPI 1.3 endpoint for ARAX, which isn't true. wonder why? (@rcpeene)
wonder if maybe the logic that creates the 'excluded_by_version' list is off? (and is somehow including ARAs in that list)
Hi @amykglen . Regarding infores
being None
, I also had to code around that in the UI. You can see that there is one entry that looks incomplete:
I ignore it, but maybe that's what is causing the issue?
Maybe we should inform the owners of that server registration, and add more checks on our end?
yeah, good idea, @isbluis!
so I think we're in a working state except cicd.rtx.ai is still unhappy; some tests are failing, apparently because it's having trouble saving responses?
not sure if @edeutsch knows anything about this. you can see output for its latest run of pytests here, by expanding the 'Run tests with pytest' section
actually, I'm thinking that might resolve itself on cicd.rtx.ai's run for my latest commit (that fixed the None
key issue in the response), which hasn't finished yet. will find out tomorrow I guess.
yep, looks like it resolved itself. awesome. so the main issue remaining I'm aware of is ARAs appearing as KPs (for when @rcpeene has a chance).
This bug with considering ARA's in smartapi.kps_excluded_by_version
was me forgetting to implement a check in smartapi.py's get_trapi_endpoints
method. It has been resolved and changes have been pushed.
looks great, thanks! (on https://arax.ci.transltr.io/)
are we good to close this issue, @rcpeene? everything seems to be working great as far as I can tell!
Yup!
start using @edeutsch's new ARAXQuery/Expand/smartapi.py!