Open YagoDel opened 5 years ago
The main problem would be that the whole final dataset would need to be loaded into memory for it to work. Maybe using the Dask support could help by loading (which wouldn't fully load), concatenating and saving to disk on every iteration?
data1 = xarray.DataArray(np.random.random((100, 1000, 1000)), dims=['a','x','y'], coords=[np.linspace(-2, -1.1, 100), np.linspace(-1, 1, 1000), np.linspace(-1, 1, 1000)])
data2 = xarray.DataArray(np.random.random((100, 1000, 1000)), dims=['a','x','y'], coords=[np.linspace(-1, -0.1, 100), np.linspace(-1, 1, 1000), np.linspace(-1, 1, 1000)])
data = xarray.concat([data1, data2], dim='a')
data.to_netcdf(savepath)
del data
for idx in range(10):
data1 = xarray.DataArray(np.random.random((10, 1000, 1000)), dims=['a','x','y'], coords=[np.linspace(idx, idx+0.9, 10), np.linspace(-1, 1, 1000), np.linspace(-1, 1, 1000)])
data1 = xarray.Dataset({'__xarray_dataarray_variable__': data1})
bigdata = xarray.open_dataset(loadpath, chunks={'a': 10})
bigdata = xarray.concat([bigdata, data1], dim='a')
bigdata.to_netcdf(savepath)
dum = savepath
savepath = loadpath
loadpath = dum
Hierarchical scans save to xarray
Basic code to make it work:
data1 = xarray.DataArray([[np.random.random((100, 100))]], dims=['a', 'b', 'x', 'y'], coords=[[0], [0], np.linspace(-1, 1, 100), np.linspace(-1, 1, 100)])
data2 = xarray.DataArray([[np.random.random((100, 100))]], dims=['a', 'b', 'x', 'y'], coords=[[0], [1], np.linspace(-1, 1, 100), np.linspace(-1, 1, 100)])
data3=xarray.concat([data2, data3], dim='b')