colour-science / flask-compress

Compress responses of your Flask application.
MIT License
117 stars 27 forks source link

cache_key should be dependent on chosen_algorithm #53

Closed xmcp closed 3 weeks ago

xmcp commented 3 weeks ago

Currently the after_request function finds the cached response through key = self.cache_key(request).

However, this key does not consider chosen_algorithm, so if a request wants brotli, the brotli response body will be cached, and following gzip requests will be incorrectly served with brotli body.

Step to reproduce:

from flask import *
from flask_compress import Compress
from flask_caching import Cache

# --- below are copied from README ---

app = Flask(__name__)

cache = Cache(app, config={
    'CACHE_TYPE': 'simple',
    'CACHE_DEFAULT_TIMEOUT': 60*60  # 1 hour cache timeout
})

# Define a function to return cache key for incoming requests
def get_cache_key(request):
    return request.url

# Initialize Flask-Compress
compress = Compress()
compress.init_app(app)

# Set up cache for compressed responses
compress.cache = cache
compress.cache_key = get_cache_key

# --- above are copied from README ---

@app.route('/')
def index():
    return 'hello'*100000

app.run('127.0.0.1', 5000)
$ curl http://127.0.0.1:5000 -H "Accept-Encoding: br"
...
$ curl http://127.0.0.1:5000 --compressed -v
*   Trying 127.0.0.1:5000...
* Connected to 127.0.0.1 (127.0.0.1) port 5000
> GET / HTTP/1.1
> Host: 127.0.0.1:5000
> User-Agent: curl/8.7.1
> Accept: */*
> Accept-Encoding: deflate, gzip
>
< HTTP/1.1 200 OK
< Server: Werkzeug/3.0.1 Python/3.12.2
< Date: Sat, 05 Oct 2024 20:46:08 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 17
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Connection: close
<
* Error while processing content unencoding: incorrect header check
* Closing connection
curl: (61) Error while processing content unencoding: incorrect header check
alexprengere commented 3 weeks ago

Thanks for digging into this. I think this can (should?) also be solved by more carefully setting the get_cache_key. Using this solves the issue:

def get_cache_key(request):
    # EDIT: the cache key mush always be a string
    return f"{request.url};{request.headers.get('Accept-Encoding')}"

The Accept-Encoding header will be directly translated to chosen_algorithm once _choose_compress_algorithm is called, so the end result is the same as your PR #54 .

I agree the current situation needs to be improved, either by merging your PR, or updating the docs with better examples and warn about this pitfall.

xmcp commented 3 weeks ago

Setting the cache key based on Accept-Encoding may cause redundancy in the cache, since different values of Accept-Encoding may lead to the same chosen_algorithm. This is not a huge problem, but I will argue that my PR is the more technically correct.

alexprengere commented 3 weeks ago

I am inclined to agree. Note that in practice, there are not many Accept-Encoding values out there, as this set "statically" set per-browser.

randomcascade commented 3 weeks ago

Hope I'm not dogpiling, but I 100% agree that the solution that @xmcp provided is more elegant. I've implemented the cache key trick mentioned above and while it works, the new fix is highly desirable for those of us using both flask-caching and flask-compress in production. Huge kudos.

alexprengere commented 3 weeks ago

After more testing of this, it turns out releasing this would have broken quite a lot of things (for starters, pretty much all my projects 😉).

Flask-Caching relies on cachelib. If you look at the documentation of cache.get, you will see that the key must be of str type.

It does not show in the example above, as the default simple cache relies on a dict, and using a tuple as key will "just work", even though this will not pass static type checking.

The trouble begins when you use another cache type, personally I rely often on FileSystemCache, as it allows flask applications served by multiple processes to share a common cache. If you try this, it will fail with the above patch, as there is code that builds a path from the key, and this fails if the key is not a str.

The error I got when testing this was not obvious and did not match the raise TypeError from cachelib:

UnboundLocalError: cannot access local variable 'bkey_hash' where it is not associated with a value

I was a bit unlucky here, as the clear error in cachelib when the key is not a str was only added in version 0.10, and the latest release of flask-caching still pins cachelib<0.10.

So the next steps:

In hindsight I should have required a test before merging, even though on simple cache it would have passed.

xmcp commented 3 weeks ago

Oops sorry for that. I only tested it with simple cache.

alexprengere commented 3 weeks ago

@xmcp are you planning to open another PR?

xmcp commented 3 weeks ago

I am busy with other things in the following days. I am okay with minor changes like casting the type to str, but I won't have time to contribute test cases anyway. You can fix this issue if you like to.

alexprengere commented 3 weeks ago

Fixed by e3aeec8de4a47cffbbee1d6c7ab2af5f048ea11c

alexprengere commented 2 weeks ago

For the record release 1.16 shipped with that fix.