Allow READ_SIZE to be passed to directly to AsyncReader

MKuranowski / aiocsv

Python: Asynchronous CSV reading/writing

https://pypi.org/project/aiocsv/

MIT License

67 stars 9 forks source link

Allow READ_SIZE to be passed to directly to AsyncReader #14

Closed hynky1999 closed 1 year ago

hynky1999 commented 1 year ago

The READ_SIZE is currently hard-coded in https://github.com/MKuranowski/aiocsv/blob/master/aiocsv/parser.py#L8. However I think the parameter should be rather passed to the class constructor.

MKuranowski commented 1 year ago

Unless you're not using CPython, most likely the hard-coding happens there: https://github.com/MKuranowski/aiocsv/blob/ea93a38e559a061b6a54ee91fcccc438c2471a1a/aiocsv/_parser.pyx#L3

However, is there real-life benefit to changing the READ_SIZE (benchmarks pls), which can't be achieved by buffering the file? It's just a guess, but it seems to me that introducing a buffering layer (with an async-equivalent io.BufferredReader) should be enough (also note that regular aiofiles should be bufferred by default).

hynky1999 commented 1 year ago

My intention is to use your awesome library for async reading of CSV stream. I am currently processing this stream and then doing some transformation with csv rows. However I need to process each csv rows as soon as possible and thus I would love to reduce READ_SIZE to a smaller number, so that I can process row as soon as I get it.

I am currently monkey patching the behaviour by ignoring the size argument in my implementation of WithAsyncRead protocol, but it would be better if I could just specify read_size to the AsyncReader directly.

I would make a PR if you don't mind the change.

MKuranowski commented 1 year ago

so that I can process row as soon as I get it.

This should already happen. To quote the documentation of read:

Fewer than size bytes may be returned if the operating system call returns fewer than size bytes.

The issue is that the underlying stream does buffering, and you don't want it.

EDIT: In other words, if the reader only has one row (shorter than 2048 bytes) available, it should return it nevertheless; and not wait for the whole 2048 bytes.

hynky1999 commented 1 year ago

Ahhhhh alright, I had better understanding then.... Thank you :)