fossar / selfoss

multipurpose rss reader, live stream, mashup, aggregation web application
https://selfoss.aditu.de
GNU General Public License v3.0
2.35k stars 343 forks source link

No support for encoding such as gzip or brotli? #1481

Closed mboelen closed 2 months ago

mboelen commented 2 months ago

I see in our log files that your software is being blocked as it does not provide any accept-encoding headers. Our rationale for doing this is to limit outdated or bad-behaving systems/crawlers while saving on resources (on our end, but especially on the internet in general). In this case, I was surprised to see a modern tool being blocked as well.

I guess this is a feature request: Is it possible to add compression support to the project (and save a lot of bytes on the internet)?

desbest commented 2 months ago

Isn't gzip or brotli something that's supposed to be done by a sysadmin (server administrator) instead of a web developer?

Do you use apache, nginx or IIS?

Contact your web host for advice on turning it on or use the Sitepoint forum.

mboelen commented 2 months ago

We host ourselves and have compression enabled on the web server (nginx).

The HTTP client (like a web browser, wget/curl, or any application) that performs the HTTP request, normally announce what types of data compression they support. Based on that outcome, the web server will then return uncompressed or compressed responses.

So what I see in our logs is that Selfoss makes a request but without any accept-encoding header related to the compression method (gz, br, deflate). Therefore it got blocked. Example line below, with the 426 being the response we return if the client is not announcing any type of data compression:

2024-04-13T12:02:24+00:00 426 2.3.4.5 "GET /feed/ HTTP/1.1" 16 "https://linux-audit.com/feed/" "Selfoss/2.19 (+https://selfoss.aditu.de)" TLSv1.2/ECDHE-ECDSA-AES256-GCM-SHA384 0.000 .

So I looked in the code base, but can't find a reference to compression methods. I only saw 'accept-encoding' in a .htaccess file. Or in other words, it looks like Selfoss (or the client that does the HTTP requests), is not supporting any form of data compression. This indirectly means every single request the software makes is "wasting" additional bytes that have to be sent over the internet.

Maybe also good to add, I don't use Selfoss myself, so can't test it from the "client" side. The reason for reaching out is to improve clients and saving a lot of internet traffic in the long haul. Hope that this clarifies the story behind the request a bit better.

desbest commented 2 months ago

I can add gzip woithin 5 seconds, just like I did in 2010 when I added some lines to .htaccess

nginx should have something similar in nginx.conf

mod_deflate for gzip

<ifmodule mod_deflate.c>
# Combine the below two lines - I've split it up for presentation
AddOutputFilterByType DEFLATE text/text text/html text/plain text/xml text/css
  application/x-javascript application/javascript
</ifmodule>

[source]

# AddEncoding allows you to have certain browsers uncompress information on the fly. Note: Not all browsers support this.
AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz

[source]

<ifModule mod_gzip.c>
  mod_gzip_on Yes
  mod_gzip_dechunk Yes
  mod_gzip_item_include file \.(html?|txt|css|js|php|pl)$
  mod_gzip_item_include handler ^cgi-script$
  mod_gzip_item_include mime ^text/.*
  mod_gzip_item_include mime ^application/x-javascript.*
  mod_gzip_item_exclude mime ^image/.*
  mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</ifModule>

[source]

# AddEncoding allows you to have certain browsers uncompress information on the fly. Note: Not all browsers support this.
AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz

[source] [two]

zlib compression

IMPROVING PERFORMANCE BY PRESERVING BANDWIDTH [^](https://web.archive.org/web/20120924004359/http://perishablepress.com/stupid-htaccess-tricks/#top)
To increase performance on PHP enabled servers, add the following directive:

# preserve bandwidth for PHP enabled servers
<ifmodule mod_php4.c>
 php_value zlib.output_compression 16386
</ifmodule>

DISABLE THE SERVER SIGNATURE [^](https://web.archive.org/web/20120924004359/http://perishablepress.com/stupid-htaccess-tricks/#top)

[source]

brotli

Brotli is a technology made by Google so as it's relatively new, I think it has to be installed onto the server, as a module, given how there has already been other open source compression technology as a server extension module, that's already been around for over 20 years.

jtojnar commented 2 months ago

Thanks for reporting.

Looks like you are right. Running php -S 127.0.0.1:8000 dump.php with the following script

<?php error_log(var_export(getallheaders(), true), 0);

reveals selfoss is only sending the following headers:

array (
  'Host' => '127.0.0.1:8000',
  'User-Agent' => 'Selfoss/2.20-SNAPSHOT (+https://selfoss.aditu.de)',
  'Referer' => 'http://127.0.0.1:8000/',
  'Accept' => 'application/atom+xml, application/rss+xml, application/rdf+xml;q=0.9, application/xml;q=0.8, text/xml;q=0.8, text/html;q=0.7, unknown/unknown;q=0.1, application/unknown;q=0.1, */*;q=0.1',
)

Compared to e.g. Firefox:

array (
  'Host' => '127.0.0.1:8000',
  'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0',
  'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
  'Accept-Language' => 'en-GB,en;q=0.8,cs;q=0.5,en-US;q=0.3',
  'Accept-Encoding' => 'gzip, deflate, br',
  'DNT' => '1',
  'Connection' => 'keep-alive',
  'Upgrade-Insecure-Requests' => '1',
  'Sec-Fetch-Dest' => 'document',
  'Sec-Fetch-Mode' => 'navigate',
  'Sec-Fetch-Site' => 'none',
  'Sec-Fetch-User' => '?1',
)

We use Guzzle HTTP client library, which uses curl internally so I had assumed it sends the correct headers automatically. Especially, when decoding encoded values is enabled by default.

But curl itself only sends Accept-Encoding with --compressed flag:

array (
  'Host' => '127.0.0.1:8000',
  'User-Agent' => 'curl/8.6.0',
  'Accept' => '*/*',
  'Accept-Encoding' => 'deflate, gzip, br, zstd',
)

Will look into it.

jtojnar commented 2 months ago

Turns out Guzzle overrides curl headers to not send Accept-Encoding by default. I have pushed a fix that overrides it back in selfoss and opened a documentation PR in guzzle: https://github.com/guzzle/guzzle/pull/3215

Thanks again for bringing it to our attention.

mboelen commented 2 months ago

Thanks for your quick response and actions. I noticed a few more issues with other RSS feed readers, so that gave me the idea to blog about it. Also keeping track of the actions taken and sharing in return. Hopefully it also inspires both developers, publishers, and users of RSS, to improve things together.