Determine if api4 should support caching, and if so, how

ErichBSchulz commented 7 years ago

just creating a case to dump ideas around how API4 should support caching (or not)

it keeps coming up in unrelated issues but I think caching needs a base issue to nut it out.

it is related to #20

ErichBSchulz commented 7 years ago

after a long chat in Mattermost with the ever patient and long-suffering @eileenmcnaughton and @xurizaemon I think what I am proposing is simply a change in the contract of API4. "Unless a call explicitly sets max_cache_age to 0 then API4 will use the most recent data up to the age set by the global "default_API4_cache_duration" configuration variable."

nganivet commented 7 years ago

A few questions to better understand:

Is the cache used for paging (ie. the cache accelerates fetching the second/subsequent pages), or is the cache done per page (ie. only useful after viewing page 1, then 2, then going back to 1 again)?
How is the "default_API4_cache_duration" configuration variable related/working with to the smart group cache parameters?
If the UI uses the API, won't the cache create undesirable side effects (ie. do a search, it gets cached, add a new contact, go back to the search screen, the same results as before show up, the user is then confused because the contact he/she just added does not show up in the results)?

xurizaemon commented 7 years ago

I have spoken (repeatedly) against caching in core, but I think we need more voices on this.

I have some concerns about caching in core. Some a gut feeling based on experience of previous CiviCRM cache decisions, others are specific (see eg comments below).

Links to previous comments from me (when I have time I'll reword this comment to precis)

My suggestion is that we implement the signature, but that we agree to not implement ANY actual cache code until the community reaches a firm decision on whether any cache implementation is positive. Some use cases here would be useful.

So here's what I propose (though I don't care at all about the function name ...):

function setMaxAge() {
  // Caching is not implemented yet. Whether it's a good idea to implement
  // caching in core is not yet determined. Discussion issue is at
  // https://github.com/civicrm/api4/issues/48
}

Then we leave the details of caching until this issue is agreed on.

eileenmcnaughton commented 7 years ago

I support @xurizaemon's suggestion. This stops the caching issue from being a blocker & gives us space to do the api v4 scaffolding without precluding adding caching to the api in an acceptable way later

nganivet commented 7 years ago

Yep, + 1 as well (I also have this strange gut feeling ...)

tttp commented 7 years ago

@xurizaemon

I'm working on a fairly big campaign (we had 100k signatures per day), and your assumption that the client shouldn't have access to outdated data is not true in our case: we have an api that does return aggregated data (eg how many signatures, segmented by location) or that returns the latest signatories. I cache the hell out of that. I don't care that it returns the same stalled "435'823 signatures" for 1 min instead of having the updated count of 436'996 if it prevents me to have to query the db for every visitor that displays the counter.

As for the writes, we already use rabbitMQ, but we did it outside of civi api: we have a "dumb" nodejs restAPI endpoint that processes the queries and push them to a rabbitMQ queue, then a php consumer daemon that fetches the messages and uses the normal "not at scale" api create from civi.

but I agree with you, they are cases where you need to have the latest data, so giving to the caller the option to choose if you have a cached version or not makes sense.

I'd argue that having a cache that scales can't be properly done in LAMP only anyway, and if you start introducing rabbitMQ, redis or whatever cache system, it is probably not that more expensive to add a proxy in front of our "not a scale" rest api from our good old civi

ie:

if read operations, a nginx that caches the query for x seconds
if write operations, a nodejs or go server that pushes the messages to a rabbitMQ queue

tha proxy could be made fairly generic and expose the same api as the civi one behind.

So far I have only mentioned the REST API, but the same reasoning of creating a "proxy" applies as well from php. Say you implement in an extension a new \Civi\FastApi4\ that has the same interface as \Civi\Api4\ but adds an extra layer of caching and voila, cache there ;)

tl;dr; no need to add in the core in either case, the cache can be added as a layer on the top

JoeMurray commented 7 years ago

Thanks for several interesting insights, Xavier. Could you do a case study / blog from campaigner perspective, and a seperate piece from a technical perspective? Both are exciting. Do you think you could provide links to repos in the second case? Very nice to hear what you're up to. Cheers!

ErichBSchulz commented 7 years ago

Ok - so I think we can all agree that caching worries us. I want to (re)assure @eileenmcnaughton particularly that I am not being blocked by this. So I really support of Chris's statement getting more voices on this. I like the function setMaxAge() {} approach but we have unhappy gut feelings that I think stem from doubt. ( @xurizaemon I hear your concern about caching but it seems to me that things have gone wrong because the of a committment to using a MySQL as the cache engine

It just seems to me a good plan to use the month or three we have up our sleeve to maximise input and, for those with an inclination to read and think about best patterns. (I think this applies to some of the other scalability aspects).

@nganivet:

re paging - hadn't thought about it. I think yes the initial get would cache as any get would... but there maybe the need to specifically cache a fixed sorted snapshot (but I'd like to explore that on #20)

re smart group caching - I see this as providing an alternative solution that isn't tied to mysql implementation. how/if to migrate smart groups is a big can of worms that has me a bit anxious (I think API 4 can provide caching without breaking smart groups as they exist and may possibly have a role in helping but I'm hoping to leave that conversation for much later)

re will the "cache create undesirable side effects" - yes!!! horrible annoying side-effects that will upset people very much!! which is why I think it will be hard to introduce it as an afterthought. Making the shift the to API4 presents an opportunity to get the idea of slightly stale data into the minds of extension and core coders. It seems to me that 99% of the time "slightly stale" will be fine (providing each installation is able to define its own definition of slightly), but in that 1% it will be up to coders to mark those calls.

I think our primary goal could be stated as "create an API signature for caching that scales well and that we won't regret later"

ErichBSchulz commented 7 years ago

hey @tttp thanks for that I'd like to explore this a bit more:

no need to add in the core in either case, the cache can be added as a layer on the top

I'm arguing that API4 should "have a signature that supports scaling" early on. The reason is that, as @eileenmcnaughton pointed out most sites don't have an performance issues (or don't discover them until a few years of scaling up). Therefore most of the coders submitting to core or writing extensions wont think about how the fantastic thing they are doing for CiviCRM will scale up. This mean that:

code won't anticipate caching and therefore no decisions will have been made about when to insist on fresh data, or conversely which requests (like you petition signature counts example) can have a more aggressive approach to caching. And in sites where caching gets turned things will just start to break or behave badly
code wont take advantage of queuing/asynchronous operations and people wont think about how to handle failures and rollbacks when operating off a queue stream.

Sure (@eileen) we can get a beta testing MVP out the door quite quickly - but my fear is that if CiviCRM moves from V3 to V4 API without having had a very solid think about to operate at massive scale then the opportunity will be missed - and I really don't think adding it as an add-on will get the uptake it needs to move CiviCRM into the realm of big services - ie because bits of it will remain fundamentally unable to scale.

Again all I'm saying here is "lets get the API contract right now(ish) to allow achieving this later" not "lets solve all the problems now".

Dont know if that makes any sense?

tttp commented 7 years ago

and I'd argue that if you develop something and you don't have a scale problem, you are likely not able to make an informed choice...

Does this specific call need fresh data? maybe, don't know: it depends I will know where to climb when I hit that specific wall.

I would assume that when you have scalability issues, you will identify the hotspots. Even if we were to use the same extensions, you might want to cache different calls than what I would, because our usage is different, and there is no way the developer of the extension knows what needs to be cached and what can't, because that is likely to be very specific.

...And we've been confirmed the hard way that async calls ar an order of magnitude harder to debug than the normal ones. We have dedicated queues to handle errors that comes when processing the messages, ie long after the caller has called the API. And again, probably quite specific, and no way anyone is going to implement it if they don't need it.

I do understand your aim and share it, but I don't see how we can have a unified API that handles both the "normal uncached" calls or one that scales and caches and queues.

What about an extension that does implement the cache mechanism into the api wrapper hook? wouldn't it be enough?

X+

ErichBSchulz commented 7 years ago

I guess I'm hoping that if we get the cache signature right then even if the initial code isn't tuned properly that configuration settings can be added?? mmm... scratching my head.

I do like your comment:

Say you implement in an extension a new \Civi\FastApi4\ that has the same interface as \Civi\Api4\ but adds an extra layer of caching and voila, cache there ;)

sorry I didn't see that when I posted above - its about getting the API4 signature right then being free to hotswap the caching engine!

yes I can only imagine the world of pain in queues and async - it seems unavoidable for horizontal scalability tho - so I'm not sure what the alternative is

xurizaemon commented 7 years ago

🙇 thanks all for your input, great! Sorry this is long, I didn't have time to make it shorter, etc.

Is anyone doing this?

As asked over in https://github.com/civicrm/api4/issues/18#issuecomment-279829301, I'm interested to hear of successful, scalable APIs which expose cache implementation to clients. If we can't find any, why might that be?

Past experiences

I wish I was an architect, but my duties are mostly putting out fires. Either way you end up seeing a fair bit of the underlying framework ... I think Civi has maybe had some questionable cache / "temporary results store" implementations in the past, and I think it's hard to have conversations about adding more caching without recognising that. There are good and bad reasons to consider caching; a good reason to me is "this resultset is fine for a while, let's not bother regenerating it", and a bad reason is "this query is painfully slow, so let's cache it". The latter for example just masks a pain point, dropping the cost on every 1000th user.

Exposing implementation details

I have some general feels about how we've been bitten in the past by exposing implementation details (oversharing the construction of SQL queries etc). Adding flags to expose cache details feels a lot like this to me. I raised this over in #42 but need to flag it here as one reason that exposing caching in API feels like a bad path.

Internal / external?

@ErichBSchulz I think that in some contexts it could actually make sense to have an API default for freshness. Examples I see of this might be Twitter's search API (which may be a few seconds out of date), or Stack Exchange API queries for users/posts. These are read operations but I think more importantly are delivered to "external clients".

I asked about this in https://github.com/civicrm/api4/issues/18#issuecomment-279829301 but #41 #42 also touch on it. Maybe the "internal/external" interface is the wrong term? I mean by "external" interface an API where we're OK with it being out of date, which I'm not sure is the same interface Civi's internal code should be using.

Maybe ->checkPermissions() and ->setMaxAge() feel so wrong to me because they're offering a means to switch mode, where I think clients should either have access or not.

Clients should take responsibility for their own caching

Mostly applicable to "external" clients of the API here.

assumption that the client shouldn't have access to outdated data is not true in our case

I think this was more "in cases where cached output is OK, clients are free to implement that", but the default behaviour should be accurate data. (We don't expect to have to force-refresh to get a current bank balance, for example.)

I don't care that it returns the same stalled "435'823 signatures" for 1 min

Absolutely. Most common example of this might be a Drupal block with Civi view and cached output. That's fine, and is a client caching results returned from API - 👍

In cases where we have multiple clients, eg several browsers currently hitting rest.php to get "485K signatures", there are also cache mechanisms already available - eg caching GET results in a reverse proxy - but in my view that's more about making our API endpoint RESTful, another topic. Implementing that form of caching suggests being able to effectively deconstruct queries, which to me suggests simplifying query parameters, but is #41. SE API docs on vectors hints at this.

Hotswapping FastApi4

I'd agree with @tttp sense that caching needs to suit the particular place where it's being applied. I could support extensions being able to intercept API calls / queries and return cached results to deliver that.

Unsure if Api4 needs to implement the full interface(s) of FastApi4, RedisApi4, UnicornApi4 though ... If we have an ability to swap in new API backends, does Api4 need to prepare for all potential functionality / usage? If not, can we keep Api4 clean and put the magic in the replacements?

Maybe expose setOption() method for clients to setOption('rainbow.magic', '⭐⭐⭐');, and 🦄 UnicornApi4 can react only if it implements a response to that option. (Agreeing on a signature for this means being explicit about which options/flags core Api4 reacts to, and possibly means asserting that core reacts to none of them, and that it "owns" the civicrm flagspace.)

Intercepting queries?

There's a lower level which some results caching could be implemented, and which might offer more to CiviCRM sites today. If sites could intercept queries based on SQL matches or query tags, they could return results from a cache backend without hitting SQL at all. I don't know how we'd make that work to return PEAR_DB resultsets, but it seems like it could be done, and that would be applicable without rewriting our existing ACL code (which otherwise won't benefit from this until it uses API4 - presumably not a trivial rewrite). This can only work for queries that can be cached, and which can be reliably reduced to a unique cache key (not cross-user ACL queries, not WHERE foo <= NOW()).

Wow, you made it. Thanks! :bowing_woman:

ErichBSchulz commented 7 years ago

heya @xurizaemon - here's some thinks back at you!

is anyone doing this?

good question and I cant find an example either. So agree we should pause before proceeding (but not long enough to miss the window presented by publishing API4). I can summarise why I think CiviCRM needs this combination of "default global" and "per request" cache settings built into the signature (and perhaps why other APIs dont):

CiviCRM can be very slow at scale and caching is a common way of dealing with slowness (and really importantly it is a method we can achieve readily)
Every CiviCRM instance is going to be different - extensions, usage pattern, CPUs, RAM etc, users, contacts - can you find another application with this kind of diversity of application?? (hence need for global setting which conveniently provides an opt-out)
CiviCRM keeps hitting this problem+solution and trying to solve it on a case by case basis. API4 is an opportunity to solve it once in a standard pattern (makers coders life much easier)
Not every call is the same. As I've stated 99% of gets will be very happy with slightly stale data (1-30 seconds ??) - but if the core doesn't provide a "per call way" of managing the 1% then sysadmins wont be able to wind up the global cache expiry to something they may otherwise need without upsetting users more. Conversely for some heavy analysis and reporting then using a very old ( ?? > 24 hours ) cache age could be very acceptable (eg end of period financial analysis and reporting)
MySQL (and thus presumably CiviCRM) horizontally scales badly adding caching into the API contract opens the door to lifting a lot of work off the shoulders of poor old MySQL and flicking it to a pack of young and keen redis servers.

Added to that, it seems to me the solution on the table is pretty lightweight in terms of complexity (ie one global config plus one per call option) with very simple less than/greater than logic to implement... and ticks most the boxes for ally the use-cases I can think of (but that is only a gut feeling so really appreciate it being hammered out and thought-experimented all over).

Past experiences

Yes! Lets learn from the past and not repeat past mistakes! If a Get is bad it is going to make the 1000th user unhappy on a regular basis then fix it!! or another enhancment could be to find those Gets and have them on a suitable cron to keep them alive in the background - but that's a complication for another day??

Hotswapping FastApi4 without standardising signature

I think that misses the opportunity to bikeshed the signature till we get it right... and not sure if ignorable parameters are an awesome plan (maybe they are but smells weird)

Exposing implementation details

I dont' really see providing a "cache age" limit as implementation. I think it is metadata about your Get - what and how the API engine is doing to keep that contract is implementation and that we hide!

Internal / external?

Well for me its about the 99% vs the 1% and giving coders (and potentially users) the opportunity to say "no thanks" to caching on an occassional basis makes the bitter pill more palatable. Giving client autonomy to make their own decisions is a good thing??

Clients taking responsibility for caching

The problem I have with this is it will miss many opportunities for caching as multiple hits on the same ACL query (for example) in a busy phone-banking session will just not effectively be cached...

davialexandre commented 7 years ago

Disclaimer: I'm still a new kid on the block when it comes to CiviCRM, so most of my opinions come from my experience working with different PHP frameworks and how I compare them to civi.

As I mentioned in our chat in Mattermost:

I'm totally against having the API interface/contract changed because of caching
I'm not against having caching in core (as long as it can be easily disabled and configured)

I think that the main problem here is that we're mixing domain and infrastructure details, which leads to exposing implementation details, which, unfortunatelly, is something that the API already does.

Let me try to explain my view of this, starting from caching. The funny thing is that caching itself is actually a very simple thing. You store something somewhere for some time and then you should be able to check if this something is stored, fetch it or delete it if you don't need it anymore. This is basically what is described by the CacheItemPoolInterface of PSR-6 or the simpler CacheInterface of PSR-16. Then you can have different drivers, which would implement this interface for many different things like Memcache, Redis, MySQL, text files, etc.

In the caching domain, how you interact with it is the domain logic. How you store the data is the insfrastructure detail. Here, setting the ttl or age of an item is part of the domain logic and it's ok to have this exposed in the interface.

Now let's think about the API domain. Here, I'll think about it as nothing but just a database abstraction layer which is basically what 99% of what civi's API does. It's not even an ORM, simply because it doesn't deal with objects. Given that, the domain logic/responsibilities are basically:

storing records somewhere
fetching records from somewhere
deleting records from somewhere

Where somewhere can be: a relational database (actually, this is the only possible place in civi currently and in MySQL only), a nosql database, a graph database, a bunch of text files or anything that can store data!

We have the interface for this (create, get, delete) and the infrastructure would be the DB_DataObject, that works like a driver for a relational database but it's soo poor that lots of implementation details leaked everywhere and now civi is thight coupled with it and with MySQL. The API talks directly to the database basically (I would even be surprised to find out that there's an API executing raw SQL queries).

So, how would caching work with this? Well, it would be just another infrastructure detail of the API domain. That is, the API can either get the data from cache or from db.

This is how I see it working:

Instead of having a new param on the API interface, we would have settings in the application level. It can be something simple like enable_cache=true|false or something more complex where we could enable/disable things per Entity and action; set different ttl for them; set different drivers
Have a cache component instantiated based on the application configuration
Have the API to use this component. It first searchs for things in cache. If it's not there, then it fetches from db, stores in cache and return results

That's it! No one using the API should know from where the data came from or if it's the most up to date. Whomever is responsible to maintain the application/site settings should be the only person responsible to find out what are the best options for it. Not the API users.

What if we do as proposed on the issue?

Now, assuming we get things implemented as proposed on the issue (a new param on API calls):

How will it work for chained API calls? (BTW, do we have something for chained API calls in api 4 already?)

(I'm using the API3 syntax because I don't know how it will look like in v4)

If I have a API call like this:

    $result = civicrm_api3('LeaveRequest', 'get', array(
      'sequential' => 1,
      'api.Contact.get' => array('id' => "\$value.contact_id"),
    ));

Can I say I want the contact results to come from cache, but not the LeaveRequest results? What happens if I have a LeaveRequest that is not linked to one of the cached contacts?
Can I say I want the contact results always coming from the db, but LeaveRequests coming from the cache? What happens if I have a cached LeaveRequest for a contact that was deleted (maybe The Leave Request was also deleted, but the cache was not updated)

How will it work for an API that calls another API internally?

Let's say that, internally, the LeaveRequest API calls the LeaveRequestDate API to get it's dates. Something like this (warning: very bad code example ahead):

function civicrm_api3_leave_request_get($params) {
  $leaveRequests = amazing_function_to_get_leave_requests($params);
  foreach($leaveRequests as $leaveRequest) {
    $leaveRequest['dates'] => civicrm_api3('LeaveRequestDate', 'get', ['leave_request_id' => $leaveRequest['id']]);
  }
}

Can I get only cached LeaveRequests but not cached LeaveRequestDates?
Can I get only cached LeaveRequestDates but not cached LeaveRequests?
What if LeaveRequestDate also make some other API calls internally (Currently, in v3, most API calls generate .getFields calls internally)?

You can get old data even with cache!!!

If you're using the API through HTTP, the API response can get cached by the browser or by a rever proxy even when you pass max_cache_age: 0.

If you're using a solution with multiple mysql instances (like different instances for writes and reads) then maybe not all the nodes might the most up to date data (I don't know exactly how this works for MySQL or if it's even possible, but it's something that might happen when you deal with data stored in multiple nodes)

ErichBSchulz commented 7 years ago

hey @davialexandre - thanks for this!

you raise some interesting points about "if I am happy with A being cached but want B fresh then what do I do?" - I suspect the simplest thing to do multiple API calls to get the different information (or worst case demand the freshest data on the entire request?)

I really dont think I'm suggesting exposing implementation here - at least I cant see it - could be blindness on my part - the idea is to specify the true interface between the API client and the API kernel. max_cache_age attached to an API request allows the client to describe the data they want "just gimme something quick, doesn't matter if it is a bit/a lot old" or "I need the freshest data and I'm happy to wait".

I completely agree that how the kernel implements its cache (how it hashes the cache keys, storage service etc) is all implementation and shouldn't be exposed.

I haven't looked but I bet that API calls would be made with a 'no cache' setting through the HTTP layer (could be wrong) - but I think this is really a requirement which is independent of HTTP as extensions will be a major API client.

does that make sense? I'm really trying to keep it configurable and simple and definitely not exposing implementation!

mickadoo commented 7 years ago

I think there are two issues here, which should be addressed separately:

Web Caching: The caching layer between the client and the server hosting CiviCRM
Memory Cache: The caching layer for function calls internally

I think web caching should not be part of APIv4. As mentioned here, each organization has different needs. If one organization wants to cache the Contacts.get result for 5 minutes then we can provide instructions for doing so using their webserver of choice.

Regarding memory caching, I think it's a great way to improve performance but has some caveats. If I'm a developer doing an interal API call, like Contact.get, then that data must reflect the current state of the database. Web caching allows for some stale data to be served, but internal calls must not return stale data.

Say we want to cache OptionValues, one of the frequently requested entities. We need to ensure that any internal creation/update/deletion of an OptionValue updates that memory cache. If all these update requests were coming through the API then we could clear/update the cache here. It is more difficult because we allow direct access to the database from anywhere, but this could be possible with some careful thought.

Another concern is memory usage, if I have 500,000 contacts, do I want to allow caching of all of them in memory? We would need to heavily test memory consumption with heavy loads before implementing memory caching.

civicrm / org.civicrm.api4