caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
55.8k stars 3.93k forks source link

Support "precompressed" files without "base" files #5116

Open TobiX opened 1 year ago

TobiX commented 1 year ago

As discussed in https://caddy.community/t/precompressed-files-without-base-files/17330, I was thinking about an extension to the precompressed feature: It would be nice to only have compressed files on disk and not have to keep the uncompressed files around...

Pro: Most clients support compression nowadays, so nothing changes for most clients Contra: If some client not supporting compression comes around, Caddy need to decompress the file before delivering it to the client

@francislavoie suggested in the community thread that supporting this feature might impact performance due to the additional stat calls, so we probably need to keep that aspect in mind, too...

PS: While thinking about it a bit more, I could probably implement it as a custom file system which "synthesizes" the uncompressed files :laughing:

francislavoie commented 1 year ago

I'm thinking it could be a new boolean option called precompressed_allow_no_base for a lack of a better name.

mholt commented 1 year ago

How about prefer_precompressed?

shade34321 commented 1 year ago

I'm not really sure what all is needed for this and might need help. But I wouldn't mind taking a stab at this.

francislavoie commented 1 year ago

@shade34321 take a look at the fileserver code and at the docs. Look for precompressed in the codebase for the key places to look at.

shade34321 commented 1 year ago

I've taken a look and I think this is going to take me a bit longer to understand before making a fix. So if someone else wants to work on it they can but I'll keep chugging along until I see a fix. Just wanted to update everybody.

titanventura commented 1 year ago

Would like to take this up !

mholt commented 1 year ago

Sounds good! Let us know if you have any questions.

titanventura commented 1 year ago

I have a question. What approach does caddy currently take, for clients that do not support compression ?

titanventura commented 1 year ago

would be better if i can connect on call with a person who can explain about this. I'm not sure if this is common in open-source communities. But i feel it is essential.

francislavoie commented 1 year ago

We follow the Accept-Encoding header. If a client doesn't declare it can handle compressed content, then Caddy will not compress it.

For file_server's precompressed files, that's why it's useful to have an uncompressed copy on file (most of the time).

This feature request would require decompressing the content before sending it if the client doesn't support compression, if the "no base" option is used.

ueffel commented 1 year ago

I have a question. What approach does caddy currently take, for clients that do not support compression ?

I suggest you just read the code and maybe run through it with a debugger, starting here: https://github.com/caddyserver/caddy/blob/b166b9008344786e7cfe5715b7e8a4cd3ec1bd76/modules/caddyhttp/fileserver/staticfiles.go#L233

More specifically look at this part: https://github.com/caddyserver/caddy/blob/b166b9008344786e7cfe5715b7e8a4cd3ec1bd76/modules/caddyhttp/fileserver/staticfiles.go#L357-L393

The call to encode.AcceptedEncodings normally returns an ordered list of accepted encodings in the request (Accept-Encoding) and then checks if they are present in the file system. Since you ask for the case of no compression, this loop will be skipped entirely. That means the variable file will still be nil, which makes you enter this case and open the actual file to serve: https://github.com/caddyserver/caddy/blob/b166b9008344786e7cfe5715b7e8a4cd3ec1bd76/modules/caddyhttp/fileserver/staticfiles.go#L395-L411

titanventura commented 1 year ago

Sure. Thank you all for adding more information ! Will look at it.

titanventura commented 1 year ago

But i don't understand why we need a new boolean option called prefer_precompressed for this. If caddy finds an sidecar/precompressed file that does not have it's corresponding uncompressed file, we can decompress it on the fly (for clients that do not support compression) without the need for a boolean option right ?

francislavoie commented 1 year ago

@titanventura did you read the discussion in the community forum thread? The reasoning is covered there. And reading the code, it should be clear why we do it that way currently.

titanventura commented 1 year ago

I feel this feature can be implemented without prefer_precompressed option. Because, we can implement it in a way so that, if the uncompressed file is NOT present and the client does NOT accept the encoding, we can decide with decompressing the file on the fly since we have no other option to serve the client right ? Please correct me if i'm wrong, im a newbie to the open source realm. I would love to get on call to discuss things in more detail (if that works with you).

francislavoie commented 1 year ago

As I wrote before, currently the code is set up such that if the original file does not exist (checked first), a 404 is immediately returned and no additional system calls are made. This is optimal to "fail fast" and assumes that no compressed version of the file will exist.

If we change the code (without a new option) then additional system calls would be made on all requests to missing files which can be seen as a performance regression, especially if someone is making tons of requests to brute force trying to find some files (but admittedly in that case you should probably have some rate limiting in place to stop that from happening).

My argument is that most users of precompressed will probably have both versions on disk, as is commonly done when using JS build tooling to precompress assets. So I think that performance regression to allow for not having the uncompressed copy should be opt-in.

titanventura commented 1 year ago

Ahh I get it now. This has made things much more clear. Thank you so much @francislavoie . Will try and work on it now that i've understood the issue !

singhalkarun commented 11 months ago

Hi @francislavoie @mholt I would love to spend some time solving this. Sharing my understanding below of the above discussion before I start to work on it,

Issue: If there's only a compressed copy of file on server and and the client doesn't accept the compressed file, it returns 404.

Solution:

  1. We can allow decompressing of the compressed files on the fly if the precompressed file doesn't exist and the client doesn't accept the compressed files.

    Issue: It can lead to performance regression issues.

  2. We can make it as an opt-in feature so that the server administrator can specify that he wants to decompress and serve files maybe for specific routes (if the precompressed file doesn't exist) and keep returning 404 if not opted for assuming that the prescompressed file must exist.

Please correct me if I have some flaws in understanding.

francislavoie commented 11 months ago

Sounds correct @singhalkarun.

singhalkarun commented 11 months ago

Sounds correct @singhalkarun.

Thanks for confirming @francislavoie . I am starting up on this. I will keep you posted.

3v0k4 commented 2 months ago

I'd love to get this one done. I already started working on it.