Open sharkinsspatial opened 11 months ago
If latitude is promoted to a concat_dim the output is correct (with all of the latitude values included).
I think in this case, the coordinates get inlined as a single chunk in the output references, so the chunking of the original doesn't matter. The same could be done with any variable. This is not chunk inlining but whole array inlining to a single chunk, e.g., kerchunk.utils.inline_array . However, MultiZarrToZarr doesn't even bother accumulating values of non-coordinates, since they are not used for figuring out the concatenation chunk placement.
More generally, non-equal chunks are just not possible because of the limitations of zarr at least until ZEP003 is accepted and implemented ( https://github.com/zarr-developers/zarr-python/pull/1483 ). If this were done, no magic inlining would be needed. Feel free to ping on the issue or the ZEP repo saying you need this.
While experimenting with kerchunking some Icesat2 ATL08 data I noticed an issue where using
MultiZarrToZarr
with non-dimension coordinates that had partial chunks resulted in empty values for those variables in the ouput kerchunk index.A minimal example
This seems potentially related to some of the discussion in https://github.com/fsspec/kerchunk/issues/305 (as it is also describing the case of data not aligned with chunk size).
If
latitude
is promoted to aconcat_dim
the output is correct (with all of thelatitude
values included).I may be misunderstanding the
MultiZarrToZarr
logic in this case where we have regularly sized, partially filled chunks. Is it possible to have a non-dimension variable concatenated in a linear fashion in this situation?