Write efficiency: direct I/O?

sk1p commented 3 weeks ago

I'm currently adding a zarr writer to our project, which can be roughly described as data acquisition and live processing framework for electron microscopy. I'm trying to make the writing operation as low-overhead as possible, to make room for actual data processing in the same data pipeline. (I'm also interested in offering compressed writing, which of course has a different CPU vs I/O profile, for users with a beefier system, but that's a different topic).

One approach I've used in the past was to use direct I/O bypass the page cache, resulting in much better performance (much closer to what the hardware can actually deliver).

I've built a demo repository that compares zarrs uncompressed write speed with a small prototype that writes the chunks "manually" using direct I/O: https://github.com/LiberTEM/zarr-dio-proto/

On the system I have available (AMD EPYC 7F72, 2x KCM61VUL3T20 NVMe SSD in a RAID0), the direct I/O approach is about 5x faster than the buffered I/O approach. There's also a branch that directly puts the data into a page-size-aligned buffer, which is a bit less realistic but still interesting (it writes the 32GiB in ~4s, which is about the limit of the SSDs). This is all on a single core.

Is there interest in integrating a FilesystemStore with direct I/O capabilities into zarrs? Getting this working as fast as the prototype would require some structural changes, too, which probably have to be done incrementally.

I'm also interested in trying an io_uring implementation, which would be the modern way for high-performance I/O on Linux systems.

LDeakin commented 3 weeks ago

Is there interest in integrating a FilesystemStore with direct I/O capabilities into zarrs

Yes! This looks great and I would probably use it myself. Thanks for putting together such a nice little demo.

I'm also interested in trying an io_uring implementation, which would be the modern way for high-performance I/O on Linux systems.

I am also interested in this, but I have been lazily waiting for support from object_store or opendal https://github.com/apache/opendal/issues/4520, but that could be a long time.

sk1p commented 2 weeks ago

Yes! This looks great and I would probably use it myself. Thanks for putting together such a nice little demo.

Great to hear! I'll work on a minimal PR as a basis for discussion.

I am also interested in this, but I have been lazily waiting for support from object_store or opendal apache/opendal#4520, but that could be a long time.

Yeah, integrating this different style of I/O generically into such libraries is no small task, I imagine.

LDeakin commented 2 weeks ago

Thanks for your work on this @sk1p!

xref direct IO PRs and comments related to compressed writing/potential further improvements:

sk1p commented 1 week ago

Thank you for the discussion and reviews! To pull the discussion back here, from the closed PR(s): (#58):

Yep, that could probably work. The standard encode/decode methods of codecs do not provide any access to storage. But what you are after is not so different from the partial decoders and partial encoders (#45). They pass input/output handles through the codec chain and the last codec ends up calling storage methods via StoragePartialDecoder / StoragePartialEncoder. The same ideas could be applied to non-partial encoding.

Interesting. I'll need to have a closer look.

Next on my TODO list is a full integration into our software, then I can see how well the approach works in practice - I'll write an update in this issue if I don't forget :)

If there is a need, I could also work on adding O_DIRECT read support, which I personally don't need yet, but might also be nice to have for symmetry reasons.

LDeakin / zarrs

Write efficiency: direct I/O? #53

58

64