We are testing the Prefect flow that uses the cloud-optimised library to update Zarr datasets. The creation of a Zarr dataset with a single NetCDF file and appending a Zarr dataset with another single NetCDF file have both succeeded.
We are now testing a scenario where we update a Zarr dataset with a gap by filling the gap.
For example, the source NetCDF file for 2024-01-02 did not arrive in our bucket on 2024-01-02 but was instead uploaded later, after 2024-01-03, as a delayed upload. As a result, the Zarr dataset contains a gap for 2024-01-02 because the corresponding NetCDF file was not available on time.
To simulate a gap in the Zarr dataset, I generated a test Zarr dataset by:
Creating it using a single NetCDF file for 2024-01-01.
Appending it with a single NetCDF file for 2024-01-03.
This setup simulates a gap for 2024-01-02. I then attempted to update the Zarr dataset by running the following command:
The expected result was for the NetCDF file of 2024-01-02 to be written into the correct time region, filling the gap in the Zarr dataset. The time order should have been: 2024-01-01, 2024-01-02, 2024-01-03.
However, the actual result is that the NetCDF file for 2024-01-02 was appended after 2024-01-03, resulting in the following time order: 2024-01-01, 2024-01-03, 2024-01-02.
Could you confirm whether the cloud-optimised library supports filling gaps in Zarr datasets by writing new NetCDF files to the appropriate region? This functionality would ensure that the Zarr dataset maintains a correct chronological order.
We are testing the Prefect flow that uses the cloud-optimised library to update Zarr datasets. The creation of a Zarr dataset with a single NetCDF file and appending a Zarr dataset with another single NetCDF file have both succeeded.
We are now testing a scenario where we update a Zarr dataset with a gap by filling the gap.
For example, the source NetCDF file for
2024-01-02
did not arrive in our bucket on2024-01-02
but was instead uploaded later, after2024-01-03
, as a delayed upload. As a result, the Zarr dataset contains a gap for2024-01-02
because the corresponding NetCDF file was not available on time.To simulate a gap in the Zarr dataset, I generated a test Zarr dataset by:
2024-01-01
.2024-01-03
.This setup simulates a gap for
2024-01-02
. I then attempted to update the Zarr dataset by running the following command:The expected result was for the NetCDF file of
2024-01-02
to be written into the correct time region, filling the gap in the Zarr dataset. The time order should have been:2024-01-01
,2024-01-02
,2024-01-03
.However, the actual result is that the NetCDF file for
2024-01-02
was appended after2024-01-03
, resulting in the following time order:2024-01-01
,2024-01-03
,2024-01-02
.Could you confirm whether the cloud-optimised library supports filling gaps in Zarr datasets by writing new NetCDF files to the appropriate region? This functionality would ensure that the Zarr dataset maintains a correct chronological order.