JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
101 stars 17 forks source link

setchunks doesn't work as expected #240

Closed dpabon closed 1 year ago

dpabon commented 1 year ago

Hi, I have a problem when using setchunks and saving the cube. Below the MWE

using Pkg
Pkg.activate("/Net/Groups/BGI/people/dpabon/nfdi4earth_oemc")
using YAXArrays
using Zarr
using Random

axlist = [
    RangeAxis("time", range(1, 20, length=116)),
    RangeAxis("x", range(1, 10, length=300)),
    RangeAxis("y", range(1, 5, length=100)),
    CategoricalAxis("Variable", ["var1", "var2"]),
    CategoricalAxis("another_var",[randstring(4) for i in 1:200])]

data = rand(116, 300, 100, 2, 200)

ds = YAXArray(axlist, data)

YAXArrays.Cubes.eachchunk(ds)

report that the size of the chunks is:

1×1×1×1×200 DiskArrays.GridChunks{5}

If I rechunk the cube and save the result:

ds_rechuncked = setchunks(ds, (time = 29, x = 100, y = 50, Variable = 1, another_var = 1))

savecube(ds_rechuncked, "/tmp/temp_cube.zarr")

ds_rechuncked_new = open_dataset("/tmp/temp_cube.zarr/")
ds_rechuncked_new = Cube(ds_rechuncked_new)

YAXArrays.Cubes.eachchunk(ds_rechuncked_new)

I got

4×3×2×200×2 DiskArrays.GridChunks{5}

Thanks in advance for your help.

meggart commented 1 year ago

It works as expected. Just look at the first chunk:

4×3×2×200×2 DiskArrays.GridChunks{5}:
[:, :, 1, 1, 1] =
 (1:29, 1:100, 1:50, 1:1, 1:1)    (1:29, 101:200, 1:50, 1:1, 1:1)    (1:29, 201:300, 1:50, 1:1, 1:1)
 (30:58, 1:100, 1:50, 1:1, 1:1)   (30:58, 101:200, 1:50, 1:1, 1:1)   (30:58, 201:300, 1:50, 1:1, 1:1)
 (59:87, 1:100, 1:50, 1:1, 1:1)   (59:87, 101:200, 1:50, 1:1, 1:1)   (59:87, 201:300, 1:50, 1:1, 1:1)
 (88:116, 1:100, 1:50, 1:1, 1:1)  (88:116, 101:200, 1:50, 1:1, 1:1)  (88:116, 201:300, 1:50, 1:1, 1:1)

The first chunk is defined by the indices (1:29, 1:100, 1:50, 1:1, 1:1) which is what you set. I know these GridChunk objects can be a bit strange, if you want to extract a chunk size from them you can use

DiskArrays.approx_chunksize(YAXArrays.Cubes.eachchunk(ds_rechuncked_new))

dpabon commented 1 year ago

My bad, I though, 4×3×2×200×2 is the size of each chunk. Thanks for the clarification, Fabian.