documentcloud / jammit

Industrial Strength Asset Packaging for Rails
http://documentcloud.github.com/jammit/
MIT License
1.16k stars 197 forks source link

Jammit + browser caching + CDNs #45

Closed agibralter closed 7 years ago

agibralter commented 14 years ago

I was just wondering... would it make sense to build some sort of system into Jammit to allow the browser caching of JS and CSS assets? What I imagine is something like this:

When the jammit command gets run, it calculates a MD5 checksum of each package and creates a file, assets.lock.yml in config/. That file could look something like:

package1: 50bda6102c8dd57a5ead84e99d493776
package2: 1dd5a5eabda693707672c8d84e99d450

It also symlinks (or copies) each package to have its MD5 checksum as a name:

50bda6102c8dd57a5ead84e99d493776.js@ -> package1.js
50bda6102c8dd57a5ead84e99d493776.js.gz@ -> package1.js.gz

Then, when the Rails app is running in production, include_javascripts(:package1) could instead generate:

<script src="/assets/50bda6102c8dd57a5ead84e99d493776.js" type="text/javascript"></script>

Also, a deployment script could copy these MD5-named assets to CDNs after running jammit.

Is this a crazy idea? Is there a way to do something similar already?

documentcloud commented 14 years ago

Because Rails already includes the timestamp in the generated URL, it's already safe to set far-futures expires headers on your assets. The link tag looks like this:

<link href="/assets/ui-datauri.css?1276127577" media="all" rel="stylesheet" type="text/css" />

And, in Nginx, we have:

location ~ ^/assets/ {
  passenger_enabled on;
  expires max;
}

When the asset package is updated, the timestamp will change. I don't think we need to reinvent the built-in Rails fingerprinting...

agibralter commented 14 years ago

Ah, true... I was just reading that the "?" query string style URL was not recommended by Google because a lot of proxies don't play too nice with it... but I guess that's Rails's problem to fix.

Also, the other thing that's causing me trouble is that all my app servers have different mtimes for the assets since I'm using Jammit to generate the assets at deploy time, which is different for every server. I may just try to use something like http://github.com/eliotsykes/asset_fingerprint in conjunction with Jammit.

jashkenas commented 14 years ago

Nah, it can be simpler than that, if you don't mind the Rails-style fingerprint. Either prebuild your assets in a central location, and rsync them out as part of your deploy, or, even easier, just add a step to your deploy task that sets the mtime of public/assets/**/* after they're built. Then you don't have to install anything.

agibralter commented 14 years ago

I have an autoscaling setup where I boot up and shut down app servers on the fly... any thoughts on how I could sync mtimes in that case?

jashkenas commented 14 years ago

In that case, if they're not all getting deployed as part of a single task, but piecemeal, the "asset_fingerprint" plugin, or another that changes Rails' behavior, might be a simpler option. But Rsync also has a --times option that you can use to preserve the modification time of centrally-built asset packages.

http://www.samba.org/ftp/rsync/rsync.html

tashian commented 14 years ago

Speaking of CDNs, we are using jammit + S3 on opencongress.org but it's a little hacky because you can't upload .gz files to S3 and have them served with the right content type (so our files end up on S3 as .cssgz and .jsgz). Our S3 sync happens during cap deploy. If anyone is interested in the details/scripts, let me know.

jashkenas commented 14 years ago

tashian: I certainly would be interested. If you'd like to gist it here, perhaps we can link to it from the Jammit documentation as an example. Are you setting far-futures expires headers on the S3 files?

tashian commented 14 years ago

Yeah, I'm trying to set far-future expires headers but it doesn't always work. Everything is already on github but let me give you a brief map of it:

Here's the basic idea. We use synch_s3_assets plugin (haven't found anything better yet): http://github.com/opencongress/opencongress/tree/master/vendor/plugins/synch_s3_asset_host

The version here is one I modified so that s3sync.rb will set the S3 headers correctly when encountering a .cssgz or .jsgz file. And I try to set the Expires header properly here too. See lines 515-550 of s3sync/s3sync.rb

And we have a capistrano jammit recipe that simply renames the appropriate files after running jammit: http://github.com/opencongress/opencongress/blob/master/config/deploy.rb lines 49-57.

I was wrong before when I said S3 can't handle .gz files -- it's safari that has a problem with them, even if content-encoding and content-type are set right.

And then capistrano also hooks into the recipe provided with the synch_s3_assets plugin on production deploy.

I think that's all there is to it. On the rails side we're using config.action_controller.asset_host. I wrote a lambda block to always serve stylesheets and other assets from the same CDN host: http://github.com/opencongress/opencongress/blob/master/config/environments/production.rb lines 16-18

tashian commented 14 years ago

I should note two more things -- one, I don't love this implementation, I think there could be a much smoother way to do this. I especially don't like the way the s3sync script works. And two, we have had an occasional issue where a file or two doesn't show up on the CDN for some reason, but a redeploy has always fixed it. I'm not sure why, we haven't seen the problem recently and I think it may have been permissions-related.

jashkenas commented 14 years ago

Is there anything specific you think that Jammit could provide that would make your deploys smoother? Or is it more of an S3-tooling thing?

tashian commented 14 years ago

Well, yeah, I wish I didn't have to rename assets/ files from .css.gz to .cssgz so that they serve properly from S3. Most web servers will try to serve a Content-Encoding: gzip file when you access a URL like http://localhost:3000/style.css (assuming there is a file called style.css.gz). But S3 doesn't do that. You have to be explicit and it will only deliver the file at the URL you provide.

So if there was an "S3" switch in jammit it would do this: rename .css.gz => .cssgz and .js.gz => .jsgz for all assets (which we do in our cap recipe) and send out the right URLs to the right browsers (which normally you don't have to do -- but with S3 you have to be explicit)

We do this second piece -- which I forgot to mention above -- here in a monkey patch: http://github.com/opencongress/opencongress/blob/master/vendor/plugins/jammit_hacks/init.rb

tashian commented 14 years ago

As you can see it's kind of hard to decouple the S3 synching from the asset packaging...

steveh commented 13 years ago

I love agibralters' idea, and I've implemented it as a proof-of-concept:

https://gist.github.com/675902

It's the simplest code I could write that did what I want. Refactorings, ideas, or better integration with Jammit gratefully accepted.

The way I've done it is you're expected to run a Rake task rake jammit:lock after running jammit -f from the command line. This creates assets.yml.lock with an md5sum for each file. It also symlinks each package with the md5sum in the path, to eg public/cache/deadbeef/common.js.gz.

Then, in config/initializers/jammit.rb, I monkey patch Jammit to refer to the symlinked hashed URL instead of a timestamp appended URL.

Possible ideas:

agibralter commented 13 years ago

I like the idea of having both the checksums and the assets. The checksum filenames are nice because it means that if you have a workflow that copies assets to S3 or a CDN after jammit, you'll never overwrite old assets: you would be able to support two running versions of a site at once. Could this be useful if you wanted to release new features to some users but not others...? I'm not sure how that use case would work in practice, though.

Jeremy, what do you think of the .lock idea? Perhaps jammit --lock? Or a config option lock_assets: on | off. Perhaps the helper for include_javascripts could take a :lock => true option that uses md5 filenames instead of rails' timestamp filenames.

opengovernment commented 13 years ago

Wanted to let you guys know that we've since cleaned up our process on this and are now using cloudfront custom origin instead of S3. It's working quite well!

http://blog.opengovernment.org/2011/02/10/cloudfront-s3-rails-and-jammit-on-apache/

There is still a big question about checksums. CloudFront ignores the rails-style fingerprinting so my solution uses the git revision of the repo, but I'd love to see a solution that doesn't change all asset URLs on every deploy.

steveh commented 13 years ago

I've got something like this working:

https://github.com/steveh/fingerjam

It's little more than a barely-documented monkeypatch

agibralter commented 13 years ago

Just found out about this on the Ruby5 podcast: https://github.com/donaldpiret/asset_hash

Might be a nice alternative to building in asset hashing into Jammit proper.

jashkenas commented 13 years ago

I'm not a big fan of the lockfile idea in general ... I think that if we can garner the equivalent information from the files themselves, that would be preferable. That said, Rails 3.1 uses MD5 hashes of the file contents by default, apparently ... so perhaps this issue becomes a nonissue?

agibralter commented 13 years ago

Yeah lockfile is not necessary. But I do think that Jammit shouldn't expect "?#{timestamp}"-style cache busting... I'm not sure what the best solution is. Right now I'm post-processing my Jammit-ed stylesheets with a rake task on deploy:

https://gist.github.com/991732

It actually works pretty nicely so far. Wish it were a bit more... automatic though.

I would really like to see Jammit be a strong alternative to sprockets in Rails 3.1+... I much prefer the asset.yml file to requires in my JS files.

There is a whole lot going on here though. Here are some thoughts for moving forward:

Jammit should probably load the Rails app even when running from the command line. This way Jammit can respect Rails's asset-related procs: ActionController::Base.asset_host and ActionController::Base.asset_path. Alternatively, Jammit can split itself into a pure command line version that doesn't load Rails (but doesn't mess with asset paths at all), and a version that loads Rails (perhaps a Rake task).

bokor commented 12 years ago

I'm having an issue that the cache busting isn't even appending to my js and css files. I'm using jammit-s3 but the urls aren't appending them. They are locally but not in production and I can't figure out for the life of me why this is. Any help or thoughts? Using Rails 3.0.9

adamhooper commented 8 years ago

Whoa, no comments in five years? Time to complain ;).

The thing with cache-busting is: it isn't what anybody wants. People actually want asset versioning. Here's the difference:

This is important during deploy. Assume you're deploying and it takes a while. You upload new JavaScript, and then 2min later you finish uploading all the HTML that points to it. Here's what happens to a user who comes in 1min into the deploy process with a clean browser cache:

Furthermore, ?TIMESTAMP=* does different things on different caching proxies.

Cache-busting isn't what anybody wants. People want asset versioning.