Closed ngkavin closed 1 year ago
If it's an xarray Dataset object, all you need is Dataset.to_zarr(consolidated=True)
Do you know if this something that should be done for every group or just the last one?
Should be all the groups I think. @emiliom @leewujung , Do you guys have any thoughts on this?
Only groups that are chunked. Is any group besides Beam
chunked?
Groups with a time or range_bin dimension are chunked. For EK80 this also includes the platform group.
From testing, the '.zmetadata' KeyError is raised when setting consolidated=True
in xarray to_zarr()
if the .zmetadata does not exist and we are appending to an existing zarr file. .zmetadata needs to be created by calling zarr.consolidate_metadata
after creating the zarr file in the top_level group because xarray is not being used to create zarr files, only appending to them.
So it might be best to simply call zarr.consolidate_metadata
at the very end of conversion. Otherwise we would need to call it at the beginning as well as set consolidated=True
each time we save with xarray.
In the latest conversation we discussed:
print
statement for EchoData to pull only the consolidated metadata (the "lazy-lazy" load) before the files are actually lazy-openedWe still need to:
@lsetiawan @b-reyes : this doesn’t necessarily need to be in 0.6.3, but since we’re dealing with zarr data in this release so much, does it make sense for us to address this here?
@lsetiawan @b-reyes : this doesn’t necessarily need to be in 0.6.3, but since we’re dealing with zarr data in this release so much, does it make sense for us to address this here?
For xr.Dataset.to_zarr, I believe the default value for consolidated=None
already consolidates the metadata (if it is possible). If this is the case, I think we do not need to worry about manually doing this. Maybe the functionality of to_zarr
has changed since this issue was created? I can look more into this and see if my thinking it correct. Either way, I think we don't need to worry about this for 0.6.3 since I already correctly consolidate the metadata in combine_echodata
.
@b-reyes consolidated needs to be True for zar to consolidate. I think all you need is to make sure that every Dataset.to_zarr has the consolidated=True flag. Shouldn't be much to address for this upcoming release.
From discussion: we will set it up that all .to_zarr/netcdf
will default to cnosolidated=True
, but users will be able to pass in kwargs to optionally overwrite the default operation.
Consolidating metadata will greatly increase the speed of reading array metadata, as described here: https://zarr.readthedocs.io/en/stable/tutorial.html#consolidating-metadata.
It looks like the metadata can be consolidated with
zarr.consolidate_metadata(store)
after conversion.