colour-science / flask-compress

Compress responses of your Flask application.
MIT License
117 stars 27 forks source link

flask.Response.make_conditional fails to return 304 due to Etag mismatch #24

Open danielzen opened 3 years ago

danielzen commented 3 years ago

This is more a need for technique than a library failure. Due to the necessary changes to modify Etags when returning compressed content, the Etags don't match when calling make_conditional

So, how would one get flask to acknowledge a matching etag on compressed content? Should I intercept and strip the trailing :gzip off the normally immutable request headers?

I'm at a loss. Any recommendations? Thanks in advance.

ziddey commented 3 years ago

Just came across this myself.

I'm curious if altering the etag in flask-compress is even necessary, since it's adding accept-encoding to the vary header. Shouldn't this be enough to prevent any cache from matching if the requested encoding does change?

edit: just saw https://github.com/colour-science/flask-compress/issues/15 hmm

In the interim, I've added a before_app_request to alter the header in the environ:

    for h in ["HTTP_IF_NONE_MATCH", "HTTP_IF_MATCH"]:
        if h in request.environ:
            request.environ[f"ORIG_{h}"] = request.environ[h]
            request.environ[h] = re.sub(':[^"]+', "", request.environ[h])

This would cause issues if you're using : in your etags, but flask's implementation does not. However, a 304 will then prevent flask-compress from appending the encoding onto the resulting etag... I suppose an after_app_request could be used to restore it if needed.

    if (
        "ORIG_HTTP_IF_NONE_MATCH" in request.environ
        and "etag" in response.headers
        and response.status_code == 304
    ):
        etag = response.headers["etag"].strip('"')
        for orig in request.environ["ORIG_HTTP_IF_NONE_MATCH"].split(","):
            orig = orig.strip()
            if orig.strip('"').startswith(etag):
                response.headers["ETag"] = orig
                break
    return response

Of course, this is just to get around #15 and restore conditional requests. I'm not clear if etags do need to be unique for different encodings and their ramifications-- potential range request disasters if the accepted-encoding changes for some strange reason?

edit: https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.3

So I suppose flask-compress may want to call make_conditional. This would resolve the etag 304 issue, but there are other scenarios that could still have issues:

Basically, all conditional request scenarios need careful consideration.

for reference: https://stackoverflow.com/questions/33947562/is-it-possible-to-send-http-response-using-gzip-and-byte-ranges-at-the-same-time

ziddey commented 3 years ago

@alexprengere I've done a little more research on this. I've seen etags with the encoding appended, like flask-compress is currently doing. As well, it's common to just weaken the etag (cloudflare does this). If we use this approach instead, flask/werkzeug's make_conditional should work just fine (for non-range requests...).

Something like:

etag = response.get_etag()
if etag and etag[0]:
    response.set_etag(etag[0], True)

As far as range requests, it looks like it's common to simply not support compression in this scenario. We could just check for a content-range response header:

        if (chosen_algorithm is None or
            response.mimetype not in app.config["COMPRESS_MIMETYPES"] or
            response.status_code < 200 or
            response.status_code >= 300 or
            "Content-Encoding" in response.headers or
            "Content-Range" in response.headers or
            (response.content_length is not None and
             response.content_length < app.config["COMPRESS_MIN_SIZE"])):
            return response

I did see some interesting behavior testing range requests with cloudflare. CF will accept gzip encoding for upstream requests, and will then happily serve a gzipped range response (with an unweakened etag). But for any other encodings (no accept-encoding, or br or whatever else), it will fall back to identity and weaken the etag.

This obviously can still lead to major issues, if say a HEAD request is made (with accept-encoding) to determine the content-length to subsequently make a range request.. But realistically, such responses should be excluded from compression by the user anyway (excluded mimetypes, etc), so it's mostly a non-issue. Either way, range requests are performed on the uncompressed data which can lead to all sorts of surprises.

alexprengere commented 2 years ago

Thanks a lot for this. Would you be able to work on a PR I can review?

andreykryazhev commented 2 months ago

I have the similar issue, I am using flask/werkzeug for serving static files. When browser requests some file (main.js for example) flask-compress adds gzip mark to response's ETag, for example after compressing Etag looks like "1724158821.877386-11899091-378803215:gzip".

But when browser pass this value in If-None-Match header with next request to main.js werkzeug desides that main.js on filesystem is differs from browser main.js (because for werkzeug the valid value is "1724158821.877386-11899091-378803215", not "1724158821.877386-11899091-378803215:gzip") and werkzeug returns status code 200. It leads to flask_compress compresses the result again and so on.

Actually, in my opinion it is not quite flask-compress issue (this is more werkzeug issue since I cannot override ETag validation when serving static files). But may be you have some thoughts how to resolve such issue on your side? By my opinion one of the option would be a possibility to modify response status code in flask-compress. In this case we would modify it forcefully to 304 or whatever when we needed (for example when used flask-compress cache).

P.S.: I also do not understand how weakeaning the validation can help with resolving this issue.

ziddey commented 2 months ago

Apologies for the late response. Since I'm not using If-Range or If-Match conditional requests, my prior solution has been working fine for me.

It works because If-None-Match only performs a weak comparison:

Weak comparison: two entity-tags are equivalent if their opaque-tags match character-by-character, regardless of either or both being tagged as "weak".

https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.2

Weakening etags when compressing is fairly common, at least for downstream servers. E.g. https://developers.cloudflare.com/cache/reference/etag-headers/#behavior-with-respect-strong-etags-disabled

However, this approach does not work for If-Range or If-Match requests since those require strong comparisons. Again, simply not supporting compression for range requests would solve etag issues for most scenarios here. Besides, such requests are likely natively compressed-- videos etc.

As for the mid-air collision example, this would be implemented outside of flask's native make_conditional (which seems to only be used by send_file?), so although not by the book, one could just work with the weak etag.. Otherwise, we could try to only weaken for send_file, like detecting direct_passthrough or having a Content-Disposition header.

If strong etags for compressed content is desired, flask-compress would need to hook into one of flask/werkzeug's functions: send_file can take an etag as an argument, or generate one if actually sending a file. It then calls make_conditional, which ends up generating an etag if it didn't exist in is_resource_modified.

I think the simplest solution would be to:

Actually, in the last scenario, the etag can have the compression algorithm tacked on like it is currently. The user will just need to handle it accordingly when doing any custom conditional checks.