gma / nesta

File Based CMS and Static Site Generator
http://nestacms.com
MIT License
902 stars 121 forks source link

Configure max-age cache header for each page (1) #29

Closed gma closed 12 years ago

gma commented 13 years ago

You can currently only cache HTML by writing a file to disk and getting your web server to serve it for you. In order to take advantage of reverse proxy caches, allow authors to set the max-age cache header in a page's metadata.

Tasks

pengwynn commented 13 years ago

We do this on the thesassway.com in our config.ru:

# cache control headers for Heroku
require 'rack/contrib'
use Rack::ResponseHeaders do |headers|
  headers['Cache-Control'] = 'public, max-age=1501'
end
use Rack::ETag
gma commented 13 years ago

Very cool. I didn't know that was in Rack; I think I'll shelve the story and write it up as a recipe. :-)

ms commented 12 years ago

It would be best to use a combination of max-age, etags/last-modified and local caching.

Max-age would be a relatively imprecise caching mechanism that would prevent any request from being made by the client and/or a reverse proxy to handle the request.

Etags would still require nesta to do work but it would save speed in terms of bandwidth (we only send headers) and possibly work (no need to render anything). The way etag could be done would be through a checksum of the raw haml/mdown files. Last-modified could use the OS's info (?) or even a git repository info (the most accurate but least likely to exist). Otherwise, if an interface is ever used to edit the files, it could store when it was last modified. Finally a small script could mark some files as modified recently (invalidate.rb [file]) and the programmer could be left to call it when necessary.

Finally, nesta could, through a framework or another, cache some parts of pages. It seems to me that for now the layout.rb and page.rb are re-rendered every time, and so are the snippets on category pages. Maybe those could be cached and manually invalidated. This process would only save in work since the whole page would still be sent to the client.

I have a quick question about the current caching: if one points his or her reverse proxy to these pages and only calls nesta if the html page is not there, then one needs to manually refresh the cached html if the source haml/mdown ever changes. Is that correct?

gma commented 12 years ago

I'd be quite happy with simple HTTP caching (using any of the options you outlined) provided by Rack middleware, and file caching (as currently implemented). I suspect that if anybody needs anything more complicated than you can achieve with Rack middleware, they'd be better off using something with fragment caching built-in.

Your question about deleting .html files is correct. I used to do it with a post-commit hook on my web server. That approach is described here:

http://nestacms.com/docs/creating-content/publishing-articles-with-git

ms commented 12 years ago

I'm not entirely familiar/comfortable with Rack/Sinatra just yet so I'm not sure about the Rack middleware but Sinatra provides the methods 'etags' and 'expires' which they advise end-users to put in the before block. My plan was to do something like that (at least for expire, etags might need more info from the end-file). Does that look good?

ms commented 12 years ago

A very simple use of expire and etag:

https://github.com/iKs279/nesta/commit/929f01971c8f26738a9f70f2752eb52d7efed776

I set as etag the sha1 of the rendered page. That's great because it ensures the response is actually identical but it means we only save if the file is somehow huge (which is unlikely, for example nestacms.com/ is 3.55KB of gzipped content, the jquery.js is 10 times bigger).

The other option is to, like I said before, sha1 the content of the raw source file (haml or otherwise). This saves us the rendering time but it might lead to outdated layout.haml and page.haml being used.

Finally we could sha1 all the files that are going to be used, but I don't know at which point there is no performance gain and whether the whole thing is worth it to begin with. At the very least it seems to me using etag with sha1 on stylesheets and on attachement.

[checks code for attachement]

OK so Sinatra already uses some caching by using the OS's last modified time to set the appropriate last modified header when we call send_file(). I think that using etag would make more sense (or at least should be an option) but it seems that it should be done in send_file itself. For the CSS/sass/scss stylesheets the same problem as haml comes in: we could use the checksum of the result or we could use the checksum of the original source (but if we use/include other files, we might get an incorrect result).

gma commented 12 years ago

Thanks for linking to your patch.

I don't think a solution that might miss changes to the templates – or included Sass files – should be considered. I also think that the best approach to HTTP caching varies from site to site (not all Nesta sites are "static" content), which is the main reason that I haven't plumped for a specific implementation so far.

You mentioned that you're not entirely comfortable with Rack yet. I'd recommend having a read up on how it works; it's a really simple protocol. Then check Wynn's snippet that he posted from thesassway.com. It uses Rack::ETag, which means that etags are automatically added to everything that Sinatra serves. Rack is cunning, but actually rather simple.

Here's the Rack::ETag code:

https://github.com/rack/rack/blob/master/lib/rack/etag.rb

I'm not saying I'm not interested in other approaches, but it would take me a few hours (at least) to come up with a decent plan, and I don't think the result would be a massive improvement on Wynn's config.ru.

gma commented 12 years ago

Closed (see #91).