Closed maximedion2 closed 6 months ago
So, I've been looking into this for a while, I tried many different things and I think I've managed to write something that does things "correctly", in that the decompression and chunk processing doesn't block the thread. Based on the documentation I looked up, mostly the tokio docs, the latest approach I took should do it, plus based on timestamp print outs, the reading and decompressing of chunks is running "interleaved".
That being said, some observation:
I don't really know what's under the hood of the LocalFileSystem
ObjectStore
implementation, I'm not sure if I should expect much of a speed up from optimizing the async reads, hard to say. For now, I will wrap up the implementation of a new async reader, and I will commit it as a new reader, but I will not replace the current implementation, which as far as I know works perfectly fine. I might revisit this later, but for now, I will just add my new reader and I won't use it, and I will close this issue.
Currently, the async reader, when considered on its own, is not really async, the decompression and other Zarr related operations block the thread, I'll try sending tasks to a thread pool and interleaving IO with decompression along the lines of what's described here, https://ryhl.io/blog/async-what-is-blocking/#the-rayon-crate.