bcdev / nc2zarr

A Python tool that converts NetCDF files to Zarr format
MIT License
9 stars 3 forks source link

More efficient consolidation check in dataslice.update_slice #49

Open pont-us opened 3 years ago

pont-us commented 3 years ago

dataslice.update_slice checks whether the data store is consolidated by calling zarr.open_consolidated on it, and catching the resulting exception if it isn't consolidated. It then uses xarray.open_zarr to open the data store as an xarray.Dataset. This means that, if the store is consolidated, the consolidated metadata object is read twice, which is inefficient. We should try to improve on this.

pont-us commented 3 years ago

On further investigation, I think there's a simple solution: xarray.open_zarr (at least as of version 0.19.0) also accepts an explicit consolidation parameter and throws an exception when trying to open unconsolidated data with this parameter set to True. So we can attempt a consolidated xarray.open_zarr and either use the dataset directly or catch the exception and fall back to unconsolidated, noting the consolidation state for later reference.