backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 39 forks source link

Disable Brotli for pre-compressed CSS and JS #4748

Closed philsward closed 3 years ago

philsward commented 3 years ago

Description of the need

When Brotli is in use on a web server that serves pages through https and "aggregate css and(or) aggregate js" are enabled, the CSS and JS files do not "decode" properly, resulting in console errors for those files.

net::ERR_CONTENT_DECODING_FAILED 200

For additional context and patch see: https://www.drupal.org/project/drupal/issues/2960808

Proposed solution

Adding E=no-brotli:1 to the following lines (included) removes these errors and allows the pages to load properly.

    # Serve correct content types, and prevent mod_deflate double gzip.
    RewriteRule \.css\.gz$ - [T=text/css,E=no-gzip:1,E=no-brotli:1]
    RewriteRule \.js\.gz$ - [T=text/javascript,E=no-gzip:1,E=no-brotli:1]

Additional information Example: image

klonos commented 3 years ago

👍 ...we should backport the fix that went into D8 (simple addition to .htaccess): https://www.drupal.org/files/issues/2019-12-04/2960808-mod_brotli-19.patch

philsward commented 3 years ago

The one change I would make to the patch is to the comment. It should probably reference Brotli:

# Serve correct content types, and prevent mod_deflate double gzip and mod_brotli double Brotli compression.
jlfranklin commented 3 years ago

That sounds more like a Brotli bug. We shouldn't have to specify E-no-foobar for specific products.

HTTP returns both the data's format in the Content-Type: response header and the fact it's compressed in the Content-Encoding: response header. If you can confirm the headers are correct coming from Backdrop, it's up to Brotli to recognize the gzip header.

This curl command shows the headers returned by the server for a compressed JS file. You can also find this in the "Network" tab of just about any desktop browser's web developer tools.

$ curl -H "Accept-Encoding: gzip, deflate" -I http://example.org/files/js/js_8EZpP8z6kPJw99d0wpOiPZFRKFhMsCSn8Lb2rrqUNOY.js
HTTP/1.1 200 OK
Date: Fri, 04 Dec 2020 01:40:39 GMT
Server: Apache/2.4.38 (Debian)
Last-Modified: Fri, 04 Dec 2020 01:26:20 GMT
ETag: "dc04-5b5995d220f7e-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 14123
Content-Type: application/javascript
philsward commented 3 years ago

@jlfranklin the server can dynamically serve compressed assets on the fly or serve statically pre-compressed assets. This is true with gzip and Brotli.

We have to stop double compression, otherwise we end up with:

Mod_Deflate = static .gzip -> dynamic gzip Mod_Brotli = static .gzip -> dynamic br

We're trying to tell the server "don't dynamically compress a static pre-compressed file" because the browser doesn't understand how to decompress them twice.

jlfranklin commented 3 years ago

We're trying to tell the server "don't dynamically compress a static pre-compressed file" because the browser doesn't understand how to decompress them twice.

Yes, and the Content-Encoding: gzip header is how we already tell Brotli (and everyone else) that it is compressed. Brotli needs to follow the standard and act accordingly.

If we are not generating that header, that would be a bug on our side.

philsward commented 3 years ago

Brotli is looking for br, not gzip. Brotli has nothing to do with gzip and doesn't know anything about gzip.

Let me ask: "why are we stopping mod_deflate from double compressing a .gzip file?" This has already been in the .htaccess for ages. We're telling it "stop compressing yourself".

If Brotli is compressing a .gzip file and sending it as "br", don't we also need to stop Brotli from double compressing a compressed file?

philsward commented 3 years ago

Yes, and the Content-Encoding: gzip header is how we already tell Brotli (and everyone else) that it is compressed.

Content-Encoding: gzip is how we tell the browser "It's gzip".

If the server is correctly serving the Brotli compression, the line will show:

Content-Encoding: br

jlfranklin commented 3 years ago

Let me ask: "why are we stopping mod_deflate from double compressing a .gzip file?"

Because some other software is broken. We shouldn't have to. Should we expand that to E=no-lzw:1;E=no-zip:1;E=no-deflate:1;E=no-bz2:1; E=no-7zip:1;... and add another one every time a new compression format becomes popular?

Brotli has nothing to do with gzip and doesn't know anything about gzip.

If they're in the web space, they need to be able to deal with a gzip'd stream. That doesn't mean they have to link in zlib. They can ignore and not compress anything with a Content-Encoding: header, or ignore certain encodings.

Brotli is looking for br, not gzip.

No, Brotli is generating Content-Encoding: br and ignoring any existing Content-Encoding: header. It would be within the standard for them to compress the gzip'd stream and append their encoding such that the browser gets Content-Encoding: gzip, br. It would then be on the browser to double-decompress as directed.

philsward commented 3 years ago

Because some other software is broken.

Right. All of the browsers are broken because they don't know how to de-compress an already compressed file.

Right. All of the servers are broken and have been for 20+ years that use mod_deflate because it isn't smart enough to realize it's compressing an already compressed file.

The workaround as suggested by the very people who maintain both mod_deflate and mod_brotli?

Tell the server not to compress an already compressed file.

i.e E=no-gzip:1 i.e E=no-brotli:1

See: https://documentation.help/httpd-2.4/mod_deflate.html#precompressed See: https://documentation.help/httpd-2.4/mod_brotli.html#precompressed

Should we expand that to E=no-lzw:1;E=no-zip:1;E=no-deflate:1;E=no-bz2:1; E=no-7zip:1;... and add another one every time a new compression format becomes popular?

Yes - Provided the server and browser both support it.

zstd is another excellent contender in this space and if it ever gains traction, we will need to include it as well.

philsward commented 3 years ago

On a side note, we're discussing a 28 character change (not including the proposed change to the comments).

This change won't break existing sites on hosting platform that use the legacy gzip compression, but rather prevent sites from breaking when the hosting platform uses and promotes the newer Brotli as the primary default.

Brotli is replacing gzip, not trying to coincide with it. We need to be forward compatible as much as we are backwards compatible.

philsward commented 3 years ago

Slight update:

It was determined in the weekly meeting 12/10/2020 that this is technically a bug with BackdropCMS as it will break new sites that are spun up on modern hosting platforms that use Brotli or existing sites that migrate to modern hosting that uses Brotli.

This fix is slated for a near future bugfix release.

Keep in mind this will not need to be applied to existing sites though it is moderately recommended in order to prevent those sites from breaking in the future "if" the hosting platform is changed/updated.

klonos commented 3 years ago

...this is technically a bug with BackdropCMS

Not exactly. It is a bug with the software being used in hosting providers, but we can do something on our end to prevent errors. In other words, it is not our direct responsibility, but we can account for it.

philsward commented 3 years ago

@klonos you think it's a bug with apache and not intentional by design?

klonos commented 3 years ago

Exactly @philsward ...I would need to do some research to be 100% sure + if other software like Drupal and Wordpress are having the same issue, it would help.

philsward commented 3 years ago

Ok

I'm pretty sure it's by design considering they have documentation on how to deal with with double compression for both gzip and Brotli. But I've been known to be wrong 😉

Ideally, a server admin would set this up globally for all sites, but if they don't know they need to do that then it's left to the web application to make sure their software doesn't break because it wasn't configured properly by a server admin.

Doing some research on Nginx, it can also suffer from double compression as well, but the difference is that it only compresses text/html out of the box and has to be configured to compress other mimetypes.

Apache on the other hand, compresses everything out of the box and you tell it what you don't want it to compress. I don't see that as a bug with apache but rather a design decision. Whether it is a good or bad decision is the real debate.

I don't think wordpress is dealing with this out of the box but I also don't know if they are pre-compressing files like Drupal/Backdrop. I can look at my sister's wordpress site to check.

quicksketch commented 3 years ago

I believe this is a bug in Backdrop because we're specifically telling Apache not to use its default behavior in our .htaccess file.

If a request came in from a browser that included Content-Encoding: br,gzip, indicating the browser supports both Brotli and Gzip, Backdrop's .htaccess file specifically discards the browser's preference for Brotli and serves back a Gzipped file instead.

The reason we're doing this is that pre-compressing CSS and JS is more efficient for the server, so that Apache doesn't have to recompress the file every time it's requested. Like @philsward dug up in the documentation (https://documentation.help/httpd-2.4/mod_deflate.html#precompressed and https://documentation.help/httpd-2.4/mod_brotli.html#precompressed), the Apache extensions mod_deflate and mod_brotli both have documentation on using this technique of pre-compressing, and they both note that they will double-compress content unless you set the environment variable no-gzip or no-brotli. Why there isn't an option to prevent double-compression by checking the Content-Encoding header... I don't know. But we're writing an override specifically for Apache, we should do our best to have this feature not break people's sites when the browser asks for a Brotli compressed CSS or JS file.

quicksketch commented 3 years ago

Marking this for the next bugfix release, since it's causing errors in some Apache configurations.

stpaultim commented 3 years ago

We just discussed (in weekly dev meeting) the fact that this issue has been adopted and tested in Drupal. @quicksketch suggested that it may not require any additional testing (since testing is relatively complicated and specialized). Not that additional testing would hurt.

Having said that, I'm not prepared to change the labels on this issue.

stpaultim commented 3 years ago

I decided that I am going to mark this "works for me" based upon the discussion two weeks ago in DEV meeting. If anyone things that this is wrong, please feel free to revert the labels. It's my understanding that this is a really simple change that is already in Drupal and probably could go into 1.17.5

stpaultim commented 3 years ago

This is a pretty simple change. @quicksketch has looked at it. I'm going to go ahead and mark RTBC.

quicksketch commented 3 years ago

Thanks @stpaultim, I agree. It's a small change, limited scope, and matches the approach taken by Drupal. The question of whether we would include similar changes in the future is still open, but in this situation the downside and maintenance is very limited.

I merged https://github.com/backdrop/backdrop/pull/3445 into 1.x and 1.19.x. Thanks @philsward and sorry for the long delay in moving this to completion.