Open eschnett opened 5 months ago
Thanks for opening this issue.
There has been some work adding support for the zarr storage format within ASDF. This is implemented via an extension: https://github.com/asdf-format/asdf-zarr It's a new package so please let me know if it's something you plan to use "in production" (so we can give it another review, also feel free to give it a try and open issues if you find anything). The extension offers a few options:
DirectoryStore
"flat files", S3 stores, or any of the many formats zarr supports).The use of zarr also opens up a second place where compression can be controlled (which can get a bit confusing).
@braingram Nice! We are currently discussing storage formats, and both ASDF and Zarr are contenders that have various advantages and disadvantages. On the surface, using Zarr chunking with ASDF single-file storage seems like an excellent choice. I will have a look.
When a large ndarray is stored as binary block with compression, then the (beginning of) the whole block needs to be read and decompressed even when only a small subarray is read. "Chunking" remedies this; instead of storing an ndarray as a single binary block, it is stored as a set of smaller blocks that are compressed and stored independently.
Are there plans to support this? Can this be implemented as extension?
One simple approach would be to introduce a new yaml tag
core/chunked-ndarray
that consists of a yaml map that maps offsets to ndarrays, for exampleHas there been any work in this direction?