Closed TabeaW closed 1 year ago
It's possible. If you use an skeleton
and then open the file to start adding values step by step, you should not have any memory issues that way. I should add an example to the docs, showing how to do it. Later, I will try to upload the example.
Or, if @danlooo or @felixcremer have a MWE already around that they can share, that would be also great! Then we could add that later to the docs.
Mmh, ok, thanks! Then I must have done something wrong with the skeleton. For me it said OutOfMemory while creating it.
In order to create an empty zarr or netcdf dataset you can do something along these lines using the FillArrays package to avoid allocating the memory for the full array:
using YAXArrays, FillArrays, Dates, Zarr
x = RangeAxis("x",1:900)
y = RangeAxis("y",1:1100)
t = RangeAxis("time",DateTime(2000):Hour(1):DateTime(2022,12,31,23,0,0))
ax = [x,y,t]
c = YAXArray(ax,Fill(1f32,length.(ax)...),Dict("missing_value"=>1f32))
c = setchunks(c,(time=24,x=900,y=1100))
ds = Dataset(variable = c)
ds_disk = savedataset(ds,path = "pathtooutput.zarr",backend=:zarr,skeleton=true)
This will create the empty dataset with a required chunking that you can write into. The best writing strategy would then depend on how which format the input data is actually stored in. Does this already help?
Yes, thanks a lot, the creation now works easily. Now I am facing the issue of changing the values of the dataset #229. Or is there another possibility how to write the right data to the dataset?
Changing data with changing data of a subset works fine.
@TabeaW is already written. Once you update the values, they also also written into the disk. Well, it should be like that. Check by opening again the file and exploring that part of the file.
Hi :) Thanks for the great work! I am currently trying to build a large cube (23 years of hourly data at a grid of 900x1100). With your expertise, do you think this is possible? If yes, how can I build a cube through adding 1 hour of data step by step. If I concatenatecubes, I must always use time as a variable. If I use a skeleton it's too big for memory for building it in one step.