OHDSI / ROhdsiWebApi

An R package for interfacing with a WebAPI instance
https://ohdsi.github.io/ROhdsiWebApi
10 stars 17 forks source link

Use caching for some operations #121

Open schuemie opened 4 years ago

schuemie commented 4 years ago

Some of the same calls to WebAPI will be done repeatedly, for example to see if the baseURL is correct, or to retrieve the list of valid IDs. We can avoid this, thus not only making the functions faster but also avoid unnecessary calls to the WebAPI, by using caching.

One way to implement this would be to create a global environment like here, and use that to store a cache.

For example, we only need to test whether a baseURL is valid once. We can store the list of valid baseURLs in the environmental variable, and check the list at the next call.

gowthamrao commented 4 years ago

Since WebApi supports multiple concurrent users, the cache may get obsolete. How do we make sure it does not become obsolete..

gowthamrao commented 4 years ago

After discussing with @schuemie - we should only use it for baseUrl (which should not change on concurrent changes).

ablack3 commented 4 years ago

We may also want to cache something related to user authentication. At the very least we should cache the bearer token. We might also want to cache a refresh token that would allow ROhdsiWebApi to automatically get a new bearer token when the current one expires. Does WebApi grant refresh tokens? Also the cached information would need to be specific to each instance of WebApi the user is interacting with since there might be multiple (e.g. bulk copy cohort definitions between two WebApi instances).

gowthamrao commented 3 years ago

@ablack3 @schuemie is this still an issue that we need to work on?

schuemie commented 3 years ago

I would definitely cache whether the baseUrl is valid. That would save a full round-trip for each call, and if it becomes outdated you have another problem ;-)

If I recall correctly, most calls also retrieve the full list of valid IDs in order to see if the one the user requested is among them. That is a very expensive operations (especially as the number of valid IDs grows.). I don't actually see the point in that behavior, because if you didn't do that then the user would just get an 'invalid ID' error from the WebAPI itself a few milliseconds later, which would be just as informative. But if you want to keep that behavior, I recommend caching the list of valid IDs at least for a minute or so, so if I loop over IDs (for example to get multiple cohort definitions) I don't have to download the full list of valid IDs every time.