Musings about a KP caching system

I'm thinking it would be really beneficial to our overall performance to build a KP caching system. I think Aragorn does this and is likely one reason it is so fast. Here are my musings about this. It would be good to discuss and develop a bit of a spec doc and see if we can implement this.

General ideas:

Our slowest component is waiting around for our KPs. Other processing is pretty fast
We like KP freshness and the federated model, but some caching could really help a lot and if done well have a negligible impact on freshness of information
Instead of trying to build a centralized-across-all-instances system like ResponseCache, it may be best just to have a local SQLite+file-based cache unique to each endpoint (e.g. even arax.ncats.io/test and /devED would have separate caches)
This would be similar to the TRAPI attribute caching system, which is local and file-based and the cache gets cleared when an endpoint service is launched
Although additional performance could be extracted from a system that persists across restarts, maybe the initial implementation could just clear the cache upon start of the service (attribute cache already does this)
Set it up as a separate class that could also be called from outside Expand
Input to get_cached_response() would be the URL and the query_graph
The query_graph perhaps should be "cleaned" by the method (perhaps removal of inconsequential things, then a regularizing to a string via json.dumps(query_graph, indent=0, sort_keys=True) or similar)
Perhaps then the URL+query_graph_str could be hashed to compute a unique key
Some additional metadata associated with the key would be nice:
-- datetime of the last fetch attempt
-- datetime of the last successful fetch
-- URL
-- query_graph_str?
-- n_results in response
-- total fetch time
-- timeout or success or failure of fetch?

Other musings:

Should the class also be the HTTP fetcher itself? or is it just a storage system?
One potential benefit to it being the HTTP fetcher itself, it that one could imagine that the BackgroundTasker could make calls to the KP caching system to "keep it fresh". There could be a method freshen_cache() which could find the oldest item in the cache and try to "freshen" it. i.e. redo the query, get the latest results and replace the older cache contents with the new results and stats.
One could imagine that if the last time we attempted to fetch results from knowledge collaborary and it timed out after 30 seconds, why not wait 5 minutes during the freshening process? If we actually get a result eventually, great, we'll cache it and then we're no longer bound by the 30 seconds. We can wait longer because we're doing it in the background. The next time a query comes in, we'll actually have the result in hand instantly instead of waiting for 30 seconds and still not having it.
If the last attempt to fetch the resource in the cache resulted in timeout or failure, then if this is a "live" query, then just return a failure immediately. There's no point in wasting precious time for something that is unlikely to work. Let the background process keep trying in the background.
We also want to have a caching system for ARAX as a whole. That seems a bit secondary for various reasons, but maybe it could also make use of the same overall system but in a slightly different mode, where it could cache itself rather than cache child queries. Again, here the whole "freshening" concept would be fun and useful, but since ARAX would normally be freshening itself, maybe it is better to have this be a separate class completely. Maybe BackgroundTasker could be running kp_cache.freshen_cache() and arax_cache.freshen_cache() and then would operate a bit differently. It would just trigger a new query and have the remote cache itself rather than actively doing the caching? i.e. maybe the main difference between kp_cacher and arax_cacher is that kp_cacher is the controller that does the queries and stores the results, whereas the arax_cacher is merely a storage system for what arax is already doing? And the freshener just decides which queries should be freshened and triggers the queries but wouldn't be storing the results because the remote would be caching itself?

RTXteam / RTX

Musings about a KP caching system #2374