Reading-eScience-Centre / ncwms

ncWMS - A Web Map Service for displaying environmental data over the web
Other
62 stars 30 forks source link

Enable Cache-Control headers for client-side caching #22

Open jameshiebert opened 6 years ago

jameshiebert commented 6 years ago

We run a few applications that utilize ncWMS (thanks for the great project!). We've been digging into performance and have been realizing that our NetCDF base files rarely change... like once a year maybe... or once every five years with the IPCC cycles.

Our application spends a lot of time loading mapping tiles, even ones that it's seen before many times. It seems to us like it would be worth enabling the Cache-control headers on WMS requests, and optionally setting the Last-Modified header, based on the timestamp on the source file.

Pretty sure that this would save a ton of bandwidth and processing time on the server side. Is this something that the dev team would consider?

guygriffiths commented 6 years ago

It's a tricky one. As I understand it, the Cache-Control is only useful when an expiry time is supplied. Whilst we can add a valid time depending on when a dataset is due to be refreshed, there is no guarantee that the dataset won't have been manually refreshed in the meantime.

We could potentially add Cache-Control headers to datasets with an automatic refresh period, and document the fact that this may lead to users not seeing manually updated tiles.

This will save bandwidth, but processing on the server side should be fairly minimal for commonly loaded tiles, provided you have a ncWMS cache enabled (and it is large enough). The tiles are re-rendered, but the data used to plot them is cached, and data extraction is by far the largest processing overhead.

jameshiebert commented 6 years ago

Hi @guygriffiths, thanks for the discussion. A few comments in response:

  1. I don't think that it's entirely true to say that Cache-Control is useless without an expiry time. Even setting Cache-Control to "public" allows a client to send a conditional request and ask whether the resource has been modified since it last retrieved it. Then the server can respond with a 304 Not Modified (saving bandwidth and processing). So I'd say that there's value in setting this header without any downsides.

  2. A possible way to introduce this feature would be to create an optional global administrator setting (off by default) for the "max-age" component of the Cache-Control header. This would let server administrators set a value that's compatible with their workflow. For example in our organization, we rarely update base data, and when we do we're OK with users waiting a day for fresh updates. So we set max-age to one day (or 86400 seconds). Others may have a lower tolerance for staleness and could set their max-age appropriately short.

  3. You say that data extraction is the largest processing overhead, which I'm sure is true from the server's perspective. But from the client's perspective, the browser often has to load many tiles per map and not all of these requests can be made at the same time. For example Firefox, by default, limits the number of sequential requests to 6. We have a map that loads about 16 tiles, which means that the browser has to wait for 3 different sets of dispatches to fill in the whole map. If you look at the timings for later requests (see image below) they are blocked and basically have to wait for 700ms before they can even send the request out. With caching the browser can immediately decide that it doesn't need to make a request and the map will load almost instantaneously.

map timings blocked