Closed sethmlarson closed 3 years ago
I seem to remeber this playing into some primitives around streaming bytes vs text that we never ended up digging into?
A good first pass onto this would be to change the decoder interface slightly, so that instead of eg. yeilding a byte chunk, they yield a list of byte chunks.
On the first refactoring pass, we don't need to actually change the internal implmentation much - the decoders can just always yield a list with a single item.
We'd then be able to add a chunk_size
argument to the decoders, which would return 0, 1, or many properly-sized chunks on each yield.
Updated the issue title to reflect the current Response.aiter_*
API :-) (see #610).
How could I help with this issue?
Hi @b0g3r! I think this is still something we’d like to have, and given discussions in https://github.com/python-gitlab/python-gitlab/pull/1036 it seems like some folks would like to see it too. :)
Ways to move forward would be:
Do I understand correctly that we will need to forward chunk_size here? https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_dispatch/urllib3.py#L112-L115
@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.
I think we want to look at controlling the chunk size directly from response.iter_bytes()
/response.aiter_bytes()
, instead…
@b0g3r So, as with comment https://github.com/encode/httpx/issues/394#issuecomment-567899958 - the right place to start with this would be a pull request to https://github.com/encode/httpx/blob/master/httpx/_decoders.py that changes the interface of the decoders, so that they return a list of bytes rather than bytes.
(And correspondingly, changing the places where the response calls the decoder such as https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_models.py#L915 to deal with a list of bytes as a return result.)
I'd start with that as a foundational pull request, which will then make the remaining work much easier. (Adding chunk sizes to the decoder interface, and through to the response methods.)
chunk_size=1
because requests.Response.iter_content has itfor part in self._raw_stream:
yield part
let's use bytestring as buffer
buffer = b""
for part in self._raw_stream:
buffer += part
while len(buffer) >= chunk_size:
yield buffer[:chunk_size]
buffer = buffer[chunk_size:]
if buffer:
yield buffer
chunk_size=ITER_CHUNK_SIZE
(512) because requests has it 🌚 (a)iter_raw
(a)iter_bytes
(a)iter_test
@tomchristie As I see (a)iter_raw
doesn't use any decoder 🤔
Would be good to have chunk_size=None
option so that httpx can return chunks at the HTTP chunk boundaries as per the requests library - this is useful for apps that require timely delivery.
Requests allowed setting chunk_size within
.iter_content()
which is currently not an option for our alternatives.stream()
and.stream_text()
.For
.stream_text()
we should go the extra step and fix the issue that users sometimes run into when using this feature and use chunk-size for measuring the decoded text, not the raw bytes.