Closed hynky1999 closed 1 year ago
Unless you're not using CPython, most likely the hard-coding happens there: https://github.com/MKuranowski/aiocsv/blob/ea93a38e559a061b6a54ee91fcccc438c2471a1a/aiocsv/_parser.pyx#L3
However, is there real-life benefit to changing the READ_SIZE (benchmarks pls), which can't be achieved by buffering the file? It's just a guess, but it seems to me that introducing a buffering layer (with an async-equivalent io.BufferredReader) should be enough (also note that regular aiofiles should be bufferred by default).
My intention is to use your awesome library for async reading of CSV stream. I am currently processing this stream and then doing some transformation with csv rows. However I need to process each csv rows as soon as possible and thus I would love to reduce READ_SIZE
to a smaller number, so that I can process row as soon as I get it.
I am currently monkey patching the behaviour by ignoring the size
argument in my implementation of WithAsyncRead
protocol, but it would be better if I could just specify read_size
to the AsyncReader
directly.
I would make a PR if you don't mind the change.
so that I can process row as soon as I get it.
This should already happen. To quote the documentation of read:
Fewer than size bytes may be returned if the operating system call returns fewer than size bytes.
The issue is that the underlying stream does buffering, and you don't want it.
EDIT: In other words, if the reader only has one row (shorter than 2048 bytes) available, it should return it nevertheless; and not wait for the whole 2048 bytes.
Ahhhhh alright, I had better understanding then.... Thank you :)
The READ_SIZE is currently hard-coded in https://github.com/MKuranowski/aiocsv/blob/master/aiocsv/parser.py#L8. However I think the parameter should be rather passed to the class constructor.