datafusion-contrib / arrow-zarr

Implementation of Zarr file format in Rust
Apache License 2.0
10 stars 1 forks source link

Improve async reader stream implementation #6

Closed maximedion2 closed 6 months ago

maximedion2 commented 7 months ago

Currently, the async reader, when considered on its own, is not really async, the decompression and other Zarr related operations block the thread, I'll try sending tasks to a thread pool and interleaving IO with decompression along the lines of what's described here, https://ryhl.io/blog/async-what-is-blocking/#the-rayon-crate.

maximedion2 commented 7 months ago

So, I've been looking into this for a while, I tried many different things and I think I've managed to write something that does things "correctly", in that the decompression and chunk processing doesn't block the thread. Based on the documentation I looked up, mostly the tokio docs, the latest approach I took should do it, plus based on timestamp print outs, the reading and decompressing of chunks is running "interleaved".

That being said, some observation:

I don't really know what's under the hood of the LocalFileSystem ObjectStore implementation, I'm not sure if I should expect much of a speed up from optimizing the async reads, hard to say. For now, I will wrap up the implementation of a new async reader, and I will commit it as a new reader, but I will not replace the current implementation, which as far as I know works perfectly fine. I might revisit this later, but for now, I will just add my new reader and I won't use it, and I will close this issue.