CartoDB / Windshaft

A Node.js map tile library for PostGIS and torque.js, with CartoCSS styling
BSD 3-Clause "New" or "Revised" License
310 stars 81 forks source link

remove cache_buster parameter #58

Closed javisantana closed 9 years ago

javisantana commented 11 years ago

this is a hack and it should be removed

strk commented 11 years ago

I don't think it should be removed. Should not be relyed upon, maybe, but why remove it ? At least in emergency cases you have a way to control the renderer cache. There could be legit cases for that. Even changes in the renderer which are not under control of the tiler, like a Mapnik version upgrade, or a "manual" modification of the XML styles stored in redis (think batch-updates).

I'd tag as "invalid" and close. Of course we still want to fix the partial cache cleanup on style change...

javisantana commented 11 years ago

ok, if you want to provide a way to invalidate caches, do it with an endpoint and document it but it should never be a user task to remove the render cache. The user should never know about the cache internals

strk commented 11 years ago

True, the user should not need to know. Actually, I guess a cache flusher should also be protected by authentication...

strk commented 11 years ago

So back to this issue. The problem with cache-buster just showed up recently when an high load application suffered by the misues of cache_buster, resulting in caches never being in effect.

Dropping the cache_buster paremeter would surely help this case, BUT...

Do we still have a need for invalidating a cache OR requesting a different cache ? The difference between "invalidating" and "requesting new" is that the former also affects other users while the latter only affects YOU (and the server that needs to hold all those caches in memory)

strk commented 11 years ago

Remember: the former case for use of cache_buster was that the renderer kept returning tiles from before the data changed. We do have a database trigger that currently invalidates Varnish so we could as well have one that invalidates the Windshaft cache... Still, varnish is smarter with the concept of "cache channel" in that (I think) it lists the set of tables from which data arrives. This is not encoded in the tiler-internal caches instead.

Since Wdinsfhat-cartodb ist he one that determines the cache channel (the set of tables) I guess we could as well encode it into the tiler-internal cache key and expose a function to clear caches by channel (ie: these tables changed...)

strk commented 11 years ago

Still, let's remember that it's not always possible to determine which tables are affected by a query until the query is run (and the same query may or may not access data from a table based on some condition within some function being invoked). This is to say that sometimes you want to let users provide their own set of tables which should invalidate a cache when changed (as we specified for the new mutilayer API).

javisantana commented 11 years ago

dont mix things here, cache_buster is being used to invalidate the CDN and it is used to invalidate the internal cache for the tiler too: I think we should change the cache_buster param name to invalidate the CDN. This is a different issue and it is a cartodb.js problem (i will report later). Luckily this is pretty easy to fix.

so lets talk about the cache_buster related to windshaft itself, i have a question: Is the rendering cache affected by the data itself ? I though it was only affected by cartocss changes

strk commented 11 years ago

On Wed, Jan 23, 2013 at 02:59:40AM -0800, javi santana wrote:

dont mix things here, cache_buster is being used to invalidate the CDN and it is used to invalidate the internal cache for the tiler too: I think we should change the cache_buster param name to invalidate the CDN. This is a different issue and it is a cartodb.js problem (i will report later). Luckily this is pretty easy to fix.

so lets talk about the cache_buster related to windshaft itself, i have a question: Is the rendering cache affected by the data itself ? I though it was only affected by cartocss changes

The rendering cache key contains SQL and CartoCSS and is additionally associated with a single table.

If CartoCSS or SQL changes, you'd be accessing a different cache item.

Each cache item has a time to live of 1 minute since last access. The tiler checks every minute for expired items.

The event that DROP cache items, other than expiration, are:

  1. Changing the style of a table drop all caches related to that table (POST STYLE / DELETE STYLE)
  2. Requesting a tile or grid with a cache_buster parameter not matching the last cache_buster parameter used by other such requests drop all caches related to the table associated with the request.

Note that the cache_buster value is not associated with a table but with the whole cache manager.

There is no event triggered by data change that affects tiler cache.

javisantana commented 11 years ago

my concrete question is: if the table data changes (sql and cartocss are the same), that cache should be cleared?

strk commented 11 years ago

On Wed, Jan 23, 2013 at 03:14:58AM -0800, javi santana wrote:

my concrete question is: if the table data changes (sql and cartocss are the same), that cache should be cleared?

No, it will NOT be cleared.

javisantana commented 11 years ago

but should be?

strk commented 11 years ago

well, given the current effort to clean the varnish one I think it could be, as it has the same scope. Unless we also drop the attempt to clean the varnish cache.

javisantana commented 11 years ago

forget varnish or other system parts.

that cache needs to be cleared when the data is changed? (yes/no)

strk commented 11 years ago

I don't think Windshaft should drive cache invalidation autonomously. It's still up to some external part of the architecture to decide when to clean what.

strk commented 11 years ago

Delegating cache invalidation to an external agent allows to centralize "data change" event detection

strk commented 11 years ago

An extreme example: data doesn't change, cartocss doesn't change, but CartoCSS references an external resource that did change. The client knows about that and wants to clear the relevant cache.

strk commented 11 years ago

"that cache needs to be cleared when the data is changed?" YES

javisantana commented 11 years ago

"that cache needs to be cleared when the data is changed?" YES

ok, so we have a bug here, right?

  1. Changing the style of a table drop all caches related to that table (POST STYLE / DELETE STYLE)

this has been added recently right? this is why cache_buster exists to remove these caches when style changes

For me cache_buster should be removed and the cache clear should be managed internally using postgres triggers or other technique. We could additionally add an endpoint to clear cache to invalidate external resources as you say

strk commented 11 years ago

On Wed, Jan 23, 2013 at 03:50:26AM -0800, javi santana wrote:

"that cache needs to be cleared when the data is changed?" YES

ok, so we have a bug here, right?

We do have a bug somewhere. Not necessarely "here" because it is not clear by whom should the cache be managed.

As you don't have a bug in varnish if varnish doesn't clear it caches (or cloudfront..)

  1. Changing the style of a table drop all caches related to that table (POST STYLE / DELETE STYLE)

this has been added recently right? this is why cache_buster exists to remove these caches when style changes

I think that's been there since I started looking at the code. Actually before I changed it to store the last cache_buster value, the sole presence of that parameter was enough to trigger clearing all table caches.

The clearance isn't performed through cache_buster but internally (whenever the style of a table is successfully changed/dropped).

For me cache_buster should be removed and the cache clear should be managed internally using postgres triggers or other technique. We could additionally add an endpoint to clear cache to invalidate external resources as you say

For sure a good first step could be to restrict the effects of a cache_buster because the current model also shares the value across multiple databases on the same machine. The "cache_buster" value could be stored within the key for the cache item, which could be made invalid (dropped) by requesting a different value.

Handling data invalidation "internally" (within Windshaft) would introduce a change in architecture as currently theres NO direct interaction between the database and Windshaft (only intermediated by Mapnik), and could be a significant performance hit (the tiler should ensure existance of triggers placed on all tables affected by a query).

Also, if we agree on reducing cache lifetime to few seconds, is big change for handling invalidation worth the trouble ? We could have a few seconds cache which is never invalidated...

Ref: #35

--strk;

strk commented 11 years ago

We need to implement cache clearing on /flush_cache before we can drop cache_buster here. See https://github.com/Vizzuality/Windshaft-cartodb/issues/73

The idea would then be that caches are cleared by the server on "data change" event, rather than by the client.

How to associate a cache to a list of tables is yet to be defined. We know how to do that from Windshaft-cartodb but not from Windshaft itself. Accepting an explicit list of "affected tables" could be an idea. Then Windshat-CartoDB would provide that list by querying the database, overriding any user-provided value.

rochoa commented 9 years ago

As per version 1.0.0 cache buster management will be a MapConfigProvider responsibility, so this is no longer valid.