earth-mover / icechunk

Open-source, cloud-native transactional tensor storage engine
https://icechunk.io
Apache License 2.0
291 stars 17 forks source link

Should icechunk work with non-uniform chunked NetCDF files? #384

Closed rsignell closed 1 week ago

rsignell commented 1 week ago

We have some yearly files that were rechunked with {'time':-1} that resulted in datasets with chunk sizes of 365,366,365 in time due to leap year. With Zarr v2 this could not be represented as a virtual dataset, but with icechunk and v3, should it be expected to work?

rabernat commented 1 week ago

Since there is still no in-spec way to have non-uniform chunks in Zarr V3, this is not possible today. But it should be trivial to support in Icechunk.

Edit: FWIW, an alternate way to do this would be to support lazy array concatenation. Like, concatenating virtual arrays of different shape into one big virtual array.

rsignell commented 1 week ago

Thanks Ryan! Indeed we were planning to concat, but just thought it would be cool to harness the power of Icechunk! Someday soon it sounds like! 🤓