RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Programmatically use SmartAPI registry in Expand to find KP APIs #1466

Closed amykglen closed 2 years ago

amykglen commented 3 years ago

start using @edeutsch's new ARAXQuery/Expand/smartapi.py!

rcpeene commented 2 years ago

While additions and alterations to smartapi.py may continue to be made as we identify a more clear use case, at the moment I have made some changes which return more information about trapi endpoints for more flexibility and convenience. Specifically, smartapi.py can be used to return all endpoints, all KPs (a subset of all endpoints), all dev KPs or all prod KPs (which are subsets of all KPs). These could be used for distinguishing endpoints for various purposes elsewhere in ARAX. Additionally, the functions which return these endpoints return more information, including the list of servers of that endpoint and the infores name. I will push these changes to a branch, Issue1466 shortly.

rcpeene commented 2 years ago

After receiving a formal standard set by Translator for maturity levels in the smartAPI registry I continued to implement new functionality in smartapi.py on branch issue1466.

Because not every TRAPI endpoint is standard in their smartapi registry entries, this code is only imperfectly able to return a list of endpoints and its relevant data. For instance, some KPs do not have infores names listed, some KPs have multiple registry entries, and some KPs have maturity levels not compliant with the four maturity levels specified by Translator.

However, this module does well enough at the moment, especially for manual usage. It has been implemented with a CLI for this purpose. Users may get a list of all trapi endpoints, just endpoints which support workflows, or just endpoints which are labeled as KPs. Additionally, these lists can be filtered via the use of a whitelist and blacklist. KPs can be further filtered by specifying a required maturity level with strict or non-strict settings.

We may implement automatic usage of this in some capacity in expand, such as a updating a cached list of KPs with certain parameters daily, but as of yet we haven't decided.

rcpeene commented 2 years ago

Some example CLI commands:

python3 smartapi.py get_trapi_endpoints
python3 smartapi.py get_trapi_endpoints -p
python3 smartapi.py get_operations_endpoints -p

python3 smartapi.py get_kps -p
python3 smartapi.py get_kps -p -m testing
python3 smartapi.py get_kps -p -m development
python3 smartapi.py get_kps -p -m development -f
python3 smartapi.py get_kps -p -m testing -f -i production testing staging development
python3 smartapi.py get_kps -p --whitelist infores:icees-dili infores:icees-asthma infores:icees-pcd
python3 smartapi.py get_kps -p -m production --blacklist infores:rtx-kg2
rcpeene commented 2 years ago

There should be sufficient usage documentation in the smartapi.py module, but as an extra measure I can describe the flags here. I believe @edeutsch and @isbluis intend to take a look at the module and provide feedback.

amykglen commented 2 years ago

awesome, thanks @rcpeene! this view is great:

(arax3.9) amys-mbp:Expand amyglen$ python smartapi.py get_kps -p
infores name                              component maturities                              n_entries
infores:automat-biolink                   KP        production                              1   
infores:automat-chem-norm                 KP        production                              1   
infores:automat-cord19                    KP        production                              1   
infores:automat-covidkop                  KP        production                              1   
infores:automat-ctd                       KP        production                              1   
infores:automat-drug-central              KP        production                              1   
infores:automat-foodb                     KP        production                              1   
infores:automat-gtex                      KP        production                              1   
infores:automat-gtopdb                    KP        production                              1   
infores:automat-gwas-catalog              KP        production                              1   
infores:automat-hetio                     KP        production                              1   
infores:automat-hgnc                      KP        production                              1   
infores:automat-hmdb                      KP        production                              1   
infores:automat-human-goa                 KP        production                              1   
infores:automat-intact                    KP        production                              1   
infores:automat-mole-pro-fda              KP        production                              1   
infores:automat-ontology-hierarchy        KP        production                              1   
infores:automat-panther                   KP        production                              1   
infores:automat-pharos                    KP        production                              1   
infores:automat-robokop                   KP        production                              1   
infores:automat-text-mining-provider      KP        production                              1   
infores:automat-uberongraph               KP        production                              1   
infores:automat-viral-proteome            KP        production                              1   
infores:biothings-explorer                KP        staging, production, development        1   
infores:cam-kp                            KP        production                              2   
infores:cohd                              KP        production, development                 1   
infores:connections-hypothesis            KP        dev, production, staging                1   
infores:genetics-data-provider            KP        staging, development, production        3   
infores:icees-asthma                      KP        production, development                 2   
infores:icees-dili                        KP        production, development                 2   
infores:icees-pcd                         KP        production, development                 2   
infores:knowledge-collaboratory           KP        production                              1   
infores:molepro                           KP        production, development, test           1   
infores:monarchinitiative                 KP        production                              1   
infores:openpredict                       KP        production                              1   
infores:rtx-kg2                           KP        production, development                 2   
infores:spoke                             KP        production                              1   
infores:sri-ontology                      KP        production                              1   
infores:text-mining-provider-cooccurrence KP        development                             1   

so n_entries captures the number of separate SmartAPI registrations that that KP has for their various instances, right? (since technically KPs can either list multiple maturities under one registration or have a separate registration for each maturity)

rcpeene commented 2 years ago

Yes, n_entries is the number of separate smartapi registry entries that exist with that infores name. This is only used when the -p argument is passed. Otherwise, the output just contains every individual smartapi registry entry without any such collating.

edeutsch commented 2 years ago

I showed off this system (with Luis's front end) to everyone at the Architecture call and it was well received (some socks came clean off!) One suggestion was made:

suggestion -- can links to the SmartAPI registry entry be added? e.g., https://smart-api.info/registry?q=acca268be3645a24556b81cc15ce0e9a

@rcpeene are those URLs available in the query result from SmartAPI? Can you add those to your JSON output? (don't need it in the -p output)

thanks!

rcpeene commented 2 years ago

It took a bit of investigation, but I was able to figure out how to fetch the smartapi registry urls for endpoints. Now a new field is returned with the endpoint dict named "smartapi_url".

edeutsch commented 2 years ago

brill! Looks great: https://arax.ncats.io/devED/api/arax/v1.3/status?authorization=smartapi

isbluis commented 2 years ago

Forgot to tag the commit that includes this latest info in the UI: 9ad6f41 Thanks!

amykglen commented 2 years ago

dynamically selecting KPs from SmartAPI is high priority per 8/3 AHM - and TRAPI 1.3 ARAX should only use TRAPI 1.3 KPs

we need to add to KPSelector in Expand to do this dynamic selection

amykglen commented 2 years ago

so we don't forget: when we do dynamic KP selection we will also need to address:

amykglen commented 2 years ago

note to @rcpeene / myself: remember to regenerate DSL_Documentation.md once work on this issue is done (before it's merged into master)

rcpeene commented 2 years ago

I believe all the necessary implementation changes have been made in ARAX_expander.py, expand_utilities.py, kp_selector.py, and trapi_querier.py, and changes have been pushed. Mostly minor refactors and attribute changes to ARAXExpander. We have yet to address the two items that @amykglen enumerated above

rcpeene commented 2 years ago

DSL_documentation has been regenerated and a few bug fixes have been made on this implementation.

rcpeene commented 2 years ago

I have merged my changes with Amy's changes to RTXConfiguration.py. Now the implementation in expand_utilities.py which uses smartapi.py also intelligently uses RTXConfiguration to choose other kps based on our instance's maturity level. Additionally, it chooses the rtx-kg2 endpoint url to use based on the is_itrb_instance attribute of the config.

edeutsch commented 2 years ago

great! This is new in master?

rcpeene commented 2 years ago

Sorry I should have specified. I pushed these changes to issue1466

rcpeene commented 2 years ago

According to @amykglen, we still need to make changes to RTXConfiguration.py such that RTXConfig.rtx_kg2_url is None unless explicitly set, so it may be used as a sort of override. Currently, my implementation in expand_utilities.py will use this attribute unless it is None, in which case it will try to choose a kg2 url based on maturity and is_itrb_instance.

amykglen commented 2 years ago

yes - I'll go into issue1466 and make that tweak. thanks!

amykglen commented 2 years ago

ok, I pushed that change to the issue1466 branch

rcpeene commented 2 years ago

Looks like the tests run smoothly in issue1466 and it is ready to be merged

amykglen commented 2 years ago

awesome! let's just do one final merge of master into issue1466 after today's mini-hackathon (since we'll be merging NewFmt into master) and make sure everything still runs ok one final time. then we should be clear to merge it into master!

amykglen commented 2 years ago

hey @edeutsch - could you update the SmartAPI registration for kg2.ci.transltr.io and arax.ci.transltr.io so that they are for TRAPI 1.3 instead of TRAPI 1.2? since they are running master, which is now TRAPI 1.3.

arax.ci.transltr.io is currently not really working because changes for this issue were merged to master and there is apparently only one 'staging' TRAPI 1.3 KP registered (COHD).

edeutsch commented 2 years ago

oh, good catch, there are several SmartAPI deficiencies. I will update that all now.

edeutsch commented 2 years ago

okay, I think I have updated everything, how does this look:

image

image

What do you think?

amykglen commented 2 years ago

nice, thanks! that looks good to me!

amykglen commented 2 years ago

so arax.ci.transltr.io is now able to select KG2 as a KP successfully, but strangely, this error happens always at this same point in processing, without any message in the log:

Screen Shot 2022-08-18 at 8 58 14 PM

what's very odd is that no such error happens when running the same query locally, even when forcing a maturity of 'staging', like CI has. no idea what this could be... wondering if @edeutsch knows anything?

edeutsch commented 2 years ago

I'm guessing that the live stream of JSON objects (that provides the logging and traffic light data is corrupted somehow. If you can provide an exact repro, maybe @isbluis can easily tell us what the offending JSON object is and we can trace the source?

amykglen commented 2 years ago

ah, ok. I see one query plan update message that it might possibly have been unhappy about - just pushed a change to that. I'll give it a few minutes to see if that happens to fix it, otherwise post back here with a repro. thanks!

amykglen commented 2 years ago

huh, ok, it's still broken. the way I'm seeing the error is by going to https://arax.ci.transltr.io/ (which is running the current code in master), then running the standard JSON example acetaminophen query.

edeutsch commented 2 years ago

So the stream and processing just appears to end there... that's why the UI is confused. Here's a repro.

edeutsch commented 2 years ago
cat > query_v1.3.json
{
  "stream_progress": true,
  "message": {
    "query_graph": {
   "edges": {
      "e00": {
         "subject":   "n00",
         "object":    "n01",
         "predicates": ["biolink:physically_interacts_with"]
      }
   },
   "nodes": {
      "n00": {
         "ids":        ["CHEMBL.COMPOUND:CHEMBL112"]
      },
      "n01": {
         "categories":  ["biolink:Protein"]
      }
   }
   }
  },
  "max_results": 100
}
^D

CONTENT=`cat query_v1.3.json`
ENDPOINT="https://arax.ci.transltr.io/api/arax/v1.3"

curl -s -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d "$CONTENT" $ENDPOINT/query
edeutsch commented 2 years ago

If you run that, maybe need to fix dialect, you should see this:

...
{"timestamp": "2022-08-19T04:45:44.102565", "level": "INFO", "code": "", "message": "The KPs Expand decided to answer e00 with are: {'infores:rtx-kg2'}"}
{"timestamp": "2022-08-19T04:45:44.102606", "level": "DEBUG", "code": "", "message": "Will use asyncio to run KP queries concurrently"}
{"timestamp": "2022-08-19T04:45:44.103025", "level": "INFO", "code": "", "message": "Expanding qedge e00 using infores:rtx-kg2"}
$

i.e. right after is says it wants to use rtx-kg2, then connection suddenly drops without another word. weird.

edeutsch commented 2 years ago

The UI is expected a bunch of status objects like that ultimately followed by the final TRAPI message. But there's not TRAPI message and so it act weird. Maybe @isbluis can take the opportunity to make it a little more robust in the face of this, but ultimately it is unexpected output.

edeutsch commented 2 years ago

https://arax.ci.transltr.io/api/arax/v1.3/status/logs is useful

amykglen commented 2 years ago

ah, that's very useful! I think I might've spotted the problem. looks like something is being added to the query plan log with a key that is None, instead of the KP's infores curie

amykglen commented 2 years ago

yep, that seems to have fixed it! arax.ci.transltr.io has finished rebuilding after my commit and now appears to be working.

amykglen commented 2 years ago

quite the list of KPs in the query summary table now! https://arax.ci.transltr.io/?r=57135

amykglen commented 2 years ago

I see infores:aragorn and infores:unsecret-agent are listed in the table, which I thought are only ARAs (not KPs). does the KP selector code currently search for KPs and ARAs, @rcpeene?

amykglen commented 2 years ago

this also seems odd - ARAX apparently tries to query itself? though it says there isn't a TRAPI 1.3 endpoint for ARAX, which isn't true. wonder why? (@rcpeene)

Screen Shot 2022-08-18 at 10 56 18 PM

wonder if maybe the logic that creates the 'excluded_by_version' list is off? (and is somehow including ARAs in that list)

isbluis commented 2 years ago

Hi @amykglen . Regarding infores being None, I also had to code around that in the UI. You can see that there is one entry that looks incomplete: image

I ignore it, but maybe that's what is causing the issue?

Maybe we should inform the owners of that server registration, and add more checks on our end?

amykglen commented 2 years ago

yeah, good idea, @isbluis!

so I think we're in a working state except cicd.rtx.ai is still unhappy; some tests are failing, apparently because it's having trouble saving responses?

Screen Shot 2022-08-18 at 11 03 00 PM

not sure if @edeutsch knows anything about this. you can see output for its latest run of pytests here, by expanding the 'Run tests with pytest' section

amykglen commented 2 years ago

actually, I'm thinking that might resolve itself on cicd.rtx.ai's run for my latest commit (that fixed the None key issue in the response), which hasn't finished yet. will find out tomorrow I guess.

amykglen commented 2 years ago

yep, looks like it resolved itself. awesome. so the main issue remaining I'm aware of is ARAs appearing as KPs (for when @rcpeene has a chance).

rcpeene commented 2 years ago

This bug with considering ARA's in smartapi.kps_excluded_by_version was me forgetting to implement a check in smartapi.py's get_trapi_endpoints method. It has been resolved and changes have been pushed.

amykglen commented 2 years ago

looks great, thanks! (on https://arax.ci.transltr.io/)

amykglen commented 2 years ago

are we good to close this issue, @rcpeene? everything seems to be working great as far as I can tell!

rcpeene commented 2 years ago

Yup!