Building a very large cube

JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package

https://juliadatacubes.github.io/YAXArrays.jl/

Other

101 stars 17 forks source link

Building a very large cube #248

Closed TabeaW closed 1 year ago

TabeaW commented 1 year ago

Hi :) Thanks for the great work! I am currently trying to build a large cube (23 years of hourly data at a grid of 900x1100). With your expertise, do you think this is possible? If yes, how can I build a cube through adding 1 hour of data step by step. If I concatenatecubes, I must always use time as a variable. If I use a skeleton it's too big for memory for building it in one step.

lazarusA commented 1 year ago

It's possible. If you use an skeleton and then open the file to start adding values step by step, you should not have any memory issues that way. I should add an example to the docs, showing how to do it. Later, I will try to upload the example.

Or, if @danlooo or @felixcremer have a MWE already around that they can share, that would be also great! Then we could add that later to the docs.

TabeaW commented 1 year ago

Mmh, ok, thanks! Then I must have done something wrong with the skeleton. For me it said OutOfMemory while creating it.

meggart commented 1 year ago

In order to create an empty zarr or netcdf dataset you can do something along these lines using the FillArrays package to avoid allocating the memory for the full array:

using YAXArrays, FillArrays, Dates, Zarr
x = RangeAxis("x",1:900)
y = RangeAxis("y",1:1100)
t = RangeAxis("time",DateTime(2000):Hour(1):DateTime(2022,12,31,23,0,0))
ax = [x,y,t]

c = YAXArray(ax,Fill(1f32,length.(ax)...),Dict("missing_value"=>1f32))
c = setchunks(c,(time=24,x=900,y=1100))
ds = Dataset(variable = c)
ds_disk = savedataset(ds,path = "pathtooutput.zarr",backend=:zarr,skeleton=true)

This will create the empty dataset with a required chunking that you can write into. The best writing strategy would then depend on how which format the input data is actually stored in. Does this already help?

TabeaW commented 1 year ago

Yes, thanks a lot, the creation now works easily. Now I am facing the issue of changing the values of the dataset #229. Or is there another possibility how to write the right data to the dataset?

TabeaW commented 1 year ago

Changing data with changing data of a subset works fine.

lazarusA commented 1 year ago

@TabeaW is already written. Once you update the values, they also also written into the disk. Well, it should be like that. Check by opening again the file and exploring that part of the file.