JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
101 stars 18 forks source link

Getting chunk details in one dimension of a YAXArray #401

Closed Sonicious closed 4 months ago

Sonicious commented 4 months ago

Is it possible with YAXArrays to find out the exact chunking of a dimension? In particular the time dimension.

In a specific setting I need to access the slices of a cube by time. I guess it is possible by analyzing the chunks directly. I also found: DiskArrays.approx_chunksize(YAXArrays.Cubes.eachchunk(dataset))[3] How much is it approximated? Can I trust this one in case of Zarr? I didn't find documentation about it.

How could you do it with irreguular sampled data? Is mapcube doing this in the background for intelligent chunk access?

gdkrmr commented 4 months ago

I would suggest something like

get_chunks(c, :Ti)

which should return an array

[1, 4, 7, 10]

with the starting indices of each chunk

meggart commented 4 months ago

The implementation of your get_chunks function would be

using DiskArrays: eachchunk
using YAXArrays: findAxis, YAXArray
function get_chunks(a::YAXArray, s)
       i = findAxis(s,a)
       isnothing(i) && error("blah")
       first.(eachchunk(a).chunks[i])
end

but in general eachchunk will always return a GridChunks object which is the outer product of the chunks along each dimension of the array. To access the chunk objects for each dimension you can access the .chunks property which contains a tuple of either RegulerChunks or IrregularChunks.

How could you do it with irreguular sampled data? Is mapcube doing this in the background for intelligent chunk access?

Yes mapCube and also the CubeTable iterator is made exactly for this purpose to optimize the sliced chunk access in these cases.

Sonicious commented 4 months ago

Yes mapCube and also the CubeTable iterator is made exactly for this purpose to optimize the sliced chunk access in these cases.

This is actually the important point. I got some performance issues though. But this might be another reason.