NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Results are not being cached #110

Closed dkoslicki closed 4 months ago

dkoslicki commented 1 year ago

With an eye towards improving performance, I propose the following: when a particular query is finished being executed, cache those results. This cache can expire with some regular cadence (say, every month), but be served up in the interim thus greatly improving responsiveness of the UI.

Due to the relatively small number of "creative queries", one could imagine updating cached results instead of deleting them, even if not yet requested by a user (pro-active caching).

edeutsch commented 1 year ago

We used to cache results years ago, but eventually stopped during times of rapid development. We had multiple instances of stale results being served up after bugs were fixed, etc. The caching system would require careful attention to the version of the code that generated the results, and crucially coupled with that is to diligently update (not currently in practice) our internal version number of the code so that cached hits are correctly determined not just from the incoming query but also the version of the code running (e.g. right now production and beta give very different results even for our example query). I think we stopped at a time when there were nearly daily checkins, rendering the cache obsolete very fast, and testing new code hindered by unintended cache hits. I think we decided that the situation was sufficiently messy that it wasn't worth it.

Development is much slower these days, so maybe it is worth trying to revive this.

dkoslicki commented 1 year ago

Thanks @edeutsch , I didn't recall we hadn't turned it back on. I assume that the UI/ARS team would have these same challenges, but the very slow collection of all the results from each ARA seems to warrant it (more akin to some other domain approaches of: "give me your email and I let you know when results are ready"), which might not be what NCATS is aiming for.

sierra-moxon commented 11 months ago

Should we get this on the architecture meeting agenda?

edeutsch commented 11 months ago

I think we should, yes. A generally agreed-upon caching policy would be beneficial here I think.

cbizon commented 9 months ago

Summary of Architecture call today:

gglusman commented 9 months ago

We were just talking about this with Jenn @jh111 and we realized there are two possibly useful meanings to "don't use cache", exemplified by these use cases: 1) "I'm a user that cares for the latest and current content. Please don't use any existing cached content when computing an answer to this question." 2) "I'm a software tool that is about to flood you with one-off queries, so don't bother caching what comes out of these queries."

In both scenarios, the 'no cache' should presumably propagate down the querying pipeline to other components.

jh111 commented 9 months ago

As an additiona note, if we hit resource limits for caching, we may want to consider some best practices for representing users. For example, if you cache based on the most common queries, you may skew results toward one person who use Translator the most. Do we want this, or do we want some limit the weight a single user has, to democratize results?

sierra-moxon commented 5 months ago

closing as completed; please reopen if I am mistaken that this has been discussed, agreed upon and implemented as possible.