Here are some changes in the current script to initialize the zarr store from some specific date and a script to seed the data later on in the dataset.
Change Includes
Added some new arguments to the netcdf_to_zarr script. --init_date, --from_init_date and --only_initialize_store.
Removed --temp_location from arguments as the pipeline is ignoring it while running on DataflowRunner.
Added a script to seed data in the zarr array itself without involving the xarray layer and chunking scheme.
Moved some functions to source_data.py from netcdf_to_zarr.py as it's reused in the data seeding script.
netcdf_to_zarr script can now be used in three different ways.
Initialize stores from start_date to end_date and write chunks. (Current flow)
Initialize stores from init_date to end_date and write chunks from start_date to end_date. The other values which are falling beyond the range of start_date and end_date will remain nans. (Required --from_init_date and optional --init_date.)
Only Initialize the store and not seed data right now. As it can be done via a different script update_data.py. (Required --only_initialize_store and optional --init_date)
Some defaults
By default the script will run the same as before.
For initialization the default init_date will be 1900-01-01. Can be changed via --init_date arg.
By default It'll initialize and start seeding the data, that behavior can be altered via --only_initialize_store which will only create stores and not write data.
Here are some changes in the current script to initialize the
zarr
store from some specific date and a script to seed the data later on in the dataset.Change Includes
netcdf_to_zarr
script.--init_date
,--from_init_date
and--only_initialize_store
.--temp_location
from arguments as the pipeline is ignoring it while running onDataflowRunner
.zarr
array itself without involving thexarray
layer and chunking scheme.source_data.py
fromnetcdf_to_zarr.py
as it's reused in the data seeding script.netcdf_to_zarr
script can now be used in three different ways.start_date
toend_date
and write chunks. (Current flow)init_date
toend_date
and write chunks fromstart_date
toend_date
. The other values which are falling beyond the range ofstart_date
andend_date
will remainnans
. (Required--from_init_date
and optional--init_date
.)update_data.py
. (Required--only_initialize_store
and optional--init_date
)Some defaults
init_date
will be1900-01-01
. Can be changed via--init_date
arg.--only_initialize_store
which will only create stores and not write data.