apache / incubator-pagespeed-ngx

Automatic PageSpeed optimization module for Nginx
http://ngxpagespeed.com/
Apache License 2.0
4.37k stars 362 forks source link

Add SRI hash support #961

Open nikolay opened 9 years ago

nikolay commented 9 years ago

SRI (Subresource Integrity) links: https://blog.cloudflare.com/an-introduction-to-javascript-based-ddos/ http://www.w3.org/TR/SRI/ https://srihash.org/

jeffkaufman commented 9 years ago

What worries me about adding SRI support is what happens when our cache is out of date? Here's the scenario that worries me:

1) PageSpeed sees a page referencing external.example.com/foo.js. 2) PageSpeed fetches external.example.com/foo.js and stores in cache. 3) PageSpeed computes the hash of the resource, and modifies the html to include that. 4) The browser fetches the resource, it has changed in the meantime, hash doesn't match, browser blocks the load.

We can mitigate this by only inserting SRI hashes for resources served with very long cache lifetimes, because those are probably not intended to change, but even then serving with a long cache lifetime isn't a promise that you won't make changes. For example, imagine JQuery notices they left a small amount of non-significant whitespace in http://code.jquery.com/ui/1.9.2/jquery-ui.js and updates it to fix that, changing its hash. If we stored jquery-ui.js in our cache, valid for max-age=315360000, then we could be serving out SRI-failures from when they made the change until it fell out of our cache, potentially a very long time.

The only place where I'm sure this is ok is canonicalizing javascript libraries.

We could add a feature where people could explicitly list urls for us to insert hashes for? But that's not much less work than generating and inserting the hashes yourself.

nikolay commented 9 years ago

@jeffkaufman Sorry for not clarifying. Yes, for canonical libraries it's okay, but it should also be okay for serving minified/combined assets, right? As if content changes, the hash in their URL should change as well, no?

My concern is that if we decide to implement this, yes, we can manually do for external libraries and so on, but we can't do this ever for the merged assets as that's out of our control as users.

jeffkaufman commented 9 years ago

This is for the case where the pagespeed-generated asset is getting mapped onto a CDN? Otherwise SRI doesn't gain you anything.

nikolay commented 9 years ago

@jeffkaufman Yes, you're right. This is for CDN domains - with or without sharding.

jeffkaufman commented 9 years ago

Yes, that does sound useful. It dramatically reduces the security risk of using a CDN for your resources.

If PageSpeed is only mapping .pagespeed. resources onto the CDN then its guaranteed to have the relevant metadata entry available, so this seems practical. It probably means putting another field in the metadata cache entry.

jeffkaufman commented 9 years ago

I wonder how well SRI interacts with the various optimizing proxies carriers run? If you're serving resources over HTTP, then carrier optimizers might find ways to minify what your serving. In our case they ideally would find nothing to minify, but they might find something. In which case we'd get a verification error.

So I guess if you turn on SRI with PageSpeed we should also turn on Cache-Control: no-transform for all our resources?

jeffkaufman commented 9 years ago

We'd also need to turn on blocking rewrites for .pagespeed. resources for the case where a request comes in and hits a different server or the optimized resource has fallen out of cache.

jeffkaufman commented 9 years ago

There's also some risk that blocking rewrites for resources wouldn't optimize 100% of the time, leading to occasional broken pages. We haven't previously used it in a context where pages would break if a resource were not fully optimized, so I'm not completely confident in this path.

jeffkaufman commented 9 years ago

It looks like the request gets a no-transform header when coming from an SRI context. I'm not currently sure what PageSpeed does with no-transform on a request for an optimized resource. I think we ignore it?

jeffkaufman commented 9 years ago

There's also a performance issue, where adding SRI means the browser can't work with partial content. This would be most visible for progressive jpegs, which currently can load in stages. I have a vague impression that browsers currently don't actually do much here, but they could in the future and we'd block that.

jeffkaufman commented 9 years ago

Added to Accepted Feature Requests.

nikolay commented 9 years ago

@jeffkaufman Thanks!