Closed piskvorky closed 2 years ago
I came up with this temporary hot-fix. Does it make sense, or am I way off?
from waitress.buffers import ReadOnlyFileBasedBuffer
from werkzeug.wsgi import _RangeWrapper
@app.after_request
def hotfix_wrapper(response):
"""
Intercept the Waitress-Flask interaction so that Flask doesn't read the entire file
from the beginning on each range request (terrible for performance).
"""
f = response.response
if isinstance(f, _RangeWrapper) and isinstance(f.iterable, ReadOnlyFileBasedBuffer):
if hasattr(f.iterable, "seekable") and f.iterable.seekable():
f.seekable = True
f.iterable.seek = f.iterable.file.seek
f.iterable.tell = f.iterable.file.tell
return response
I don't understand what is going on here. The wsgi.file_wrapper
has to be handed to Waitress wholesale, otherwise it will just read from an iterator and send whatever data the application wants to send. Waitress does not natively do anything with ranges, nor does it support wrapping the wsgi.file_wrapper
and still getting the performance speedup.
Somewhere in your app its creating/using a wsgi.file_wrapper
and then wrapping that before returning an iterator to waitress? Is that what is going on?
Second it doesn't seem to be monkey patching waitress at all, it is updating some attributes on an instance of a class. My recommendation instead is to just update f.iterable
to be f.iterable.file
to get access to the underlying file that was originally handed to the wsgi.file_wrapper
... but once again I am not sure I understand what exactly is happening.
wsgi.file_wrapper
returns an iterator that's supposed to be used as the response body, but werkzeug is wrapping that and implementing a range request on top of that response body so it's trying to seek. I believe webob does something similar to what OP is describing if you opt into it with response.conditional_response = True
[1]. Webob is not assuming you can seek/tell [2].
There is no reference in PEP 3333 to wsgi.file_wrapper
being seekable, but it definitely makes sense as a potential optimization for this use case. I'm having trouble seeing why we couldn't support it. The main caveat I can see with calling something "seekable" is that generally we only want to support seeking forward and not backward - maintaining the app iter's stream semantics.
[1] https://github.com/Pylons/webob/blob/259230aa2b8b9cf675c996e157c5cf021c256059/src/webob/response.py#L1459 [2] https://github.com/Pylons/webob/blob/259230aa2b8b9cf675c996e157c5cf021c256059/src/webob/response.py#L1556-L1614
I wasn't sure which side is at fault here, if any. Sorry if "monkey-patch Waitress" was the wrong attribution. I changed the naming above.
Should I open this ticket with werkzeug then, are they out of line with their wrapper? I'm not familiar with the specifications / interfaces, deferring to you.
@piskvorky No, it's all good. I'm dealing with COVID brain and just couldn't envision how you got into this situation. Talked about it with @mmerickel and it finally clicked.
This is a change we can make in waitress to support this use case! I don't know when I'll have time for it, but thanks for opening the issue!
After some debugging of slow 206 responses from Flask's
send_file
, I narrowed the issue down to an interface mismatch between Waitress and werkzeug.Deep down, werkzeug expects
wsgi.file_wrapper
to implementseekable()
(andseek()
andtell()
) for efficient 206:https://github.com/pallets/werkzeug/blob/347291802fcf89bc89660cc9dc62eb1303337bc2/src/werkzeug/wsgi.py#L576-L577
But Waitress provides no such methods in its
ReadOnlyFileBasedBuffer
, even when the underlying file-like (a regularopen
file on disk) has them:https://github.com/Pylons/waitress/blob/f41e59826c81d7da5309e4b3694d5ac7a8bbd626/src/waitress/buffers.py#L141-L187
As a result, werkzeug uses a code path without
seek()
, where it keeps reading blocks from the beginning of the file until it reaches the range start offset… yuck:https://github.com/pallets/werkzeug/blob/347291802fcf89bc89660cc9dc62eb1303337bc2/src/werkzeug/wsgi.py#L595-L604
Motivation: We often request 206 ranges from the end of huge files, such as ZIP archives where the "master ZIP record" lives at the end of the file. But because of the problem above, the app reads the entire multi-gigabyte ZIP file just to return the last 2 KB.
I'm not sure if this is an issue with werkzeug or waitress, but the performance is so bad that this is a show-stopper for us. Thanks.