JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
89 stars 14 forks source link

getchunksize API ? #312

Open bjarthur opened 11 months ago

bjarthur commented 11 months ago

i don't see anywhere in the code or docs an API to get the chunk size. is there one? for now i'm using [x[end] for x in A.chunks[1]], which seems fragile.

ideally i'd like to get the chunk size of the cube underlying a DimArray after a yaxconvert. but i don't even see a way to programmatically get that other then to compute it before the conversion.

thanks!

lazarusA commented 11 months ago

what do you mean? The output for GridChunks is a list of tuples with all chunks and sizes, as in here:

https://juliadatacubes.github.io/YAXArrays.jl/dev/examples/generated/UserGuide/setchuncks/?h=chunking#set-chunking-by-variable

what's the expected output that you want after calling what?

meggart commented 11 months ago

In general a DiskArray is not guaranteed to have a well-defined regular chunk size. It can happen quite often due to concatenation of unevenly-sized arrays (think of annual netcdf files with leap years) that chunks are irregular. However, there is DiskArrays.approx_chunksize which would return the chunk size as a tuple.

So I think DiskArrays.approx_chunksize(DiskArrays.eachchunk(A.data)) should work for both DimArrays and YAXArrays. I agree it would be nice to export a nicely-named function that does this.

bjarthur commented 11 months ago

ahah, i didn't realize that the chunking could be irregular. in my case though they are all the same size, so DiskArrays's approx_chunksize works well. thanks!

if you decide to hoist this API into DimensionalData or YAXArrays and export a function, i'd suggest designing an API that took a Dim name. something like, chunksize(A, Dim{:Y}). as it stands now with approx_chunksize, i have to convert a Dim into an integer index manually, which is fragile.