No support for encoding such as gzip or brotli? #1481

Closed mboelen closed 2 months ago

mboelen commented 2 months ago

I see in our log files that your software is being blocked as it does not provide any accept-encoding headers. Our rationale for doing this is to limit outdated or bad-behaving systems/crawlers while saving on resources (on our end, but especially on the internet in general). In this case, I was surprised to see a modern tool being blocked as well.

I guess this is a feature request: Is it possible to add compression support to the project (and save a lot of bytes on the internet)?

desbest commented 2 months ago

Isn't gzip or brotli something that's supposed to be done by a sysadmin (server administrator) instead of a web developer?

Do you use apache, nginx or IIS?

Contact your web host for advice on turning it on or use the Sitepoint forum.

mboelen commented 2 months ago

We host ourselves and have compression enabled on the web server (nginx).

The HTTP client (like a web browser, wget/curl, or any application) that performs the HTTP request, normally announce what types of data compression they support. Based on that outcome, the web server will then return uncompressed or compressed responses.

So what I see in our logs is that Selfoss makes a request but without any accept-encoding header related to the compression method (gz, br, deflate). Therefore it got blocked. Example line below, with the 426 being the response we return if the client is not announcing any type of data compression:

2024-04-13T12:02:24+00:00 426 "GET /feed/ HTTP/1.1" 16 "" "Selfoss/2.19 (+" TLSv1.2/ECDHE-ECDSA-AES256-GCM-SHA384 0.000 .

So I looked in the code base, but can't find a reference to compression methods. I only saw 'accept-encoding' in a .htaccess file. Or in other words, it looks like Selfoss (or the client that does the HTTP requests), is not supporting any form of data compression. This indirectly means every single request the software makes is "wasting" additional bytes that have to be sent over the internet.

Maybe also good to add, I don't use Selfoss myself, so can't test it from the "client" side. The reason for reaching out is to improve clients and saving a lot of internet traffic in the long haul. Hope that this clarifies the story behind the request a bit better.

desbest commented 2 months ago

[source] [two]

Brotli is a technology made by Google so as it's relatively new, I think it has to be installed onto the server, as a module, given how there has already been other open source compression technology as a server extension module, that's already been around for over 20 years.

jtojnar commented 2 months ago

Thanks for reporting.

Looks like you are right. Running php -S dump.php with the following script

<?php error_log(var_export(getallheaders(), true), 0);

reveals selfoss is only sending the following headers:

array (
  'Host' => '',
  'User-Agent' => 'Selfoss/2.20-SNAPSHOT (+',
  'Referer' => '',
  'Accept' => 'application/atom+xml, application/rss+xml, application/rdf+xml;q=0.9, application/xml;q=0.8, text/xml;q=0.8, text/html;q=0.7, unknown/unknown;q=0.1, application/unknown;q=0.1, */*;q=0.1',

Compared to e.g. Firefox:

array (
  'Host' => '',
  'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0',
  'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
  'Accept-Language' => 'en-GB,en;q=0.8,cs;q=0.5,en-US;q=0.3',
  'Accept-Encoding' => 'gzip, deflate, br',
  'DNT' => '1',
  'Connection' => 'keep-alive',
  'Upgrade-Insecure-Requests' => '1',
  'Sec-Fetch-Dest' => 'document',
  'Sec-Fetch-Mode' => 'navigate',
  'Sec-Fetch-Site' => 'none',
  'Sec-Fetch-User' => '?1',

We use Guzzle HTTP client library, which uses curl internally so I had assumed it sends the correct headers automatically. Especially, when decoding encoded values is enabled by default.

But curl itself only sends Accept-Encoding with --compressed flag:

array (
  'Host' => '',
  'User-Agent' => 'curl/8.6.0',
  'Accept' => '*/*',
  'Accept-Encoding' => 'deflate, gzip, br, zstd',

Will look into it.

jtojnar commented 2 months ago

Turns out Guzzle overrides curl headers to not send Accept-Encoding by default. I have pushed a fix that overrides it back in selfoss and opened a documentation PR in guzzle:

Thanks again for bringing it to our attention.

mboelen commented 2 months ago

Thanks for your quick response and actions. I noticed a few more issues with other RSS feed readers, so that gave me the idea to blog about it. Also keeping track of the actions taken and sharing in return. Hopefully it also inspires both developers, publishers, and users of RSS, to improve things together.