Closed ben-pearce closed 3 years ago
Currently, this will result in the following TypeError being raised:
TypeError: Expected unicode, got bytes
Yes, that's intentional, and that's how the built-in csv
module works.
See issue #2.
I'm sorry, but that's just not a place to do the decoding.
In synchronous Python one can use io.TextIOWrapper
to wrap a binary stream into a text stream with any hassle.
Writing such a wrapper for async is also a matter of a few lines. That's the reason I'm also closing the PR without merging.
Also, I have some feedback on the PR itself.
It's unsafe to call decode on arbitrarily sliced chunks. The parser in your PR reads in 1024-byte chunks, and there's no guarantee that it won't stop reading in the middle of a code point.
# Let's pretend that that's our stream, that will be read in chunks of 7.
a = "いろはにほへとちりぬるをわかよたれそつねならむうゐのおくやまけふこえてあさきゆめみしゑひもせす"
# As a string, there are no issues with chunk reading
for i in range(0, len(a), 7):
a[i:i+7]
# Let's now try to read this as a binary stream in 7-byte chunks
b = a.encode("utf-8")
for i in range(0, len(b), 7):
a[i:i+7].decode("utf-8")
# A UnicodeDecodeError is thrown
Alright, I noted the issues with my PR & thanks for your feedback.
This pull request adds support for decoding bytes sent to the asynchronous parser.
This is useful in cases where you have no control over the type of data provided by the async reader object.
For example when reading a csv from network using
aiohttp
the content arrives as typebytes
which needs to be decoded to Unicode:Currently, this will result in the following
TypeError
being raised:TypeError: Expected unicode, got bytes
But this pull request will decode the bytes as they're being read from the stream, allowing for more performance as data is parsed while the stream is being read.