Open danielzen opened 3 years ago
Just came across this myself.
I'm curious if altering the etag in flask-compress is even necessary, since it's adding accept-encoding
to the vary
header. Shouldn't this be enough to prevent any cache from matching if the requested encoding does change?
edit: just saw https://github.com/colour-science/flask-compress/issues/15 hmm
In the interim, I've added a before_app_request
to alter the header in the environ:
for h in ["HTTP_IF_NONE_MATCH", "HTTP_IF_MATCH"]:
if h in request.environ:
request.environ[f"ORIG_{h}"] = request.environ[h]
request.environ[h] = re.sub(':[^"]+', "", request.environ[h])
This would cause issues if you're using :
in your etags, but flask's implementation does not. However, a 304 will then prevent flask-compress from appending the encoding onto the resulting etag... I suppose an after_app_request
could be used to restore it if needed.
if (
"ORIG_HTTP_IF_NONE_MATCH" in request.environ
and "etag" in response.headers
and response.status_code == 304
):
etag = response.headers["etag"].strip('"')
for orig in request.environ["ORIG_HTTP_IF_NONE_MATCH"].split(","):
orig = orig.strip()
if orig.strip('"').startswith(etag):
response.headers["ETag"] = orig
break
return response
Of course, this is just to get around #15 and restore conditional requests. I'm not clear if etags do need to be unique for different encodings and their ramifications-- potential range request disasters if the accepted-encoding
changes for some strange reason?
edit: https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.3
So I suppose flask-compress may want to call make_conditional
. This would resolve the etag 304 issue, but there are other scenarios that could still have issues:
if-modified-since
requests, when flask itself calls make_conditional
(e.g. via send_static_file
), it will successfully convert to 304 before flask-compress gets to it and will be missing the content-encoding
. I'm not sure if this can cause any issues-- I do know that caches should use a 304's cache control headers to update the cache.if-match
will reach flask-compress with a 412 if the "matching" altered etag is used and fail the http 200-300 criteria, where it should be a 200..Basically, all conditional request scenarios need careful consideration.
for reference: https://stackoverflow.com/questions/33947562/is-it-possible-to-send-http-response-using-gzip-and-byte-ranges-at-the-same-time
@alexprengere I've done a little more research on this. I've seen etags with the encoding appended, like flask-compress is currently doing. As well, it's common to just weaken the etag (cloudflare does this). If we use this approach instead, flask/werkzeug's make_conditional
should work just fine (for non-range requests...).
Something like:
etag = response.get_etag()
if etag and etag[0]:
response.set_etag(etag[0], True)
As far as range requests, it looks like it's common to simply not support compression in this scenario. We could just check for a content-range
response header:
if (chosen_algorithm is None or
response.mimetype not in app.config["COMPRESS_MIMETYPES"] or
response.status_code < 200 or
response.status_code >= 300 or
"Content-Encoding" in response.headers or
"Content-Range" in response.headers or
(response.content_length is not None and
response.content_length < app.config["COMPRESS_MIN_SIZE"])):
return response
I did see some interesting behavior testing range requests with cloudflare. CF will accept gzip encoding for upstream requests, and will then happily serve a gzipped range response (with an unweakened etag). But for any other encodings (no accept-encoding, or br or whatever else), it will fall back to identity and weaken the etag.
This obviously can still lead to major issues, if say a HEAD request is made (with accept-encoding) to determine the content-length to subsequently make a range request.. But realistically, such responses should be excluded from compression by the user anyway (excluded mimetypes, etc), so it's mostly a non-issue. Either way, range requests are performed on the uncompressed data which can lead to all sorts of surprises.
Thanks a lot for this. Would you be able to work on a PR I can review?
I have the similar issue, I am using flask/werkzeug for serving static files. When browser requests some file (main.js for example) flask-compress
adds gzip
mark to response's ETag
, for example after compressing Etag
looks like "1724158821.877386-11899091-378803215:gzip"
.
But when browser pass this value in If-None-Match
header with next request to main.js werkzeug
desides that main.js on filesystem is differs from browser main.js (because for werkzeug
the valid value is "1724158821.877386-11899091-378803215"
, not "1724158821.877386-11899091-378803215:gzip"
) and werkzeug
returns status code 200. It leads to flask_compress
compresses the result again and so on.
Actually, in my opinion it is not quite flask-compress
issue (this is more werkzeug
issue since I cannot override ETag
validation when serving static files). But may be you have some thoughts how to resolve such issue on your side?
By my opinion one of the option would be a possibility to modify response status code in flask-compress
. In this case we would modify it forcefully to 304 or whatever when we needed (for example when used flask-compress
cache).
P.S.: I also do not understand how weakeaning the validation can help with resolving this issue.
Apologies for the late response. Since I'm not using If-Range
or If-Match
conditional requests, my prior solution has been working fine for me.
It works because If-None-Match
only performs a weak comparison:
Weak comparison: two entity-tags are equivalent if their opaque-tags match character-by-character, regardless of either or both being tagged as "weak".
https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.2
Weakening etags when compressing is fairly common, at least for downstream servers. E.g. https://developers.cloudflare.com/cache/reference/etag-headers/#behavior-with-respect-strong-etags-disabled
However, this approach does not work for If-Range
or If-Match
requests since those require strong comparisons. Again, simply not supporting compression for range requests would solve etag issues for most scenarios here. Besides, such requests are likely natively compressed-- videos etc.
As for the mid-air collision example, this would be implemented outside of flask's native make_conditional
(which seems to only be used by send_file
?), so although not by the book, one could just work with the weak etag.. Otherwise, we could try to only weaken for send_file
, like detecting direct_passthrough
or having a Content-Disposition
header.
If strong etags for compressed content is desired, flask-compress would need to hook into one of flask/werkzeug's functions:
send_file
can take an etag as an argument, or generate one if actually sending a file.
It then calls make_conditional
, which ends up generating an etag if it didn't exist in is_resource_modified
.
I think the simplest solution would be to:
send_file
-- direct_passthrough / Content-Disposition or whatever (allows any custom conditional implementation to proceed. fixes mid-air collisions).Actually, in the last scenario, the etag can have the compression algorithm tacked on like it is currently. The user will just need to handle it accordingly when doing any custom conditional checks.
This is more a need for technique than a library failure. Due to the necessary changes to modify Etags when returning compressed content, the Etags don't match when calling make_conditional
So, how would one get flask to acknowledge a matching etag on compressed content? Should I intercept and strip the trailing :gzip off the normally immutable request headers?
I'm at a loss. Any recommendations? Thanks in advance.