Open norlandrhagen opened 1 month ago
What you're doing is correct, you should be able to use all the Xarray syntax you normally would without called anything from Cubed directly.
Using the .chunk method on Xarray is supposed to be viable but there is at least one bug in cubed-xarray that was found earlier today (see PR on cubed-xarray). There may be another bug here!
Cubed-xarray is currently under-tested relative to Cubed alone.
Okay so
> rds = ds.chunk({'time':1}, chunked_array_type="cubed")
You should not need to add chunked_array_type="cubed"
here, it's supposed to automatically see that you're using cubed and assume you want to keep using cubed. I don't know why that's broken on xarray main
but I was in the process of refactoring xarray's .chunk
method anyway in https://github.com/pydata/xarray/pull/9286 and that seems to fix it 🤷♂️
I'm also able to reproduce your the bug with writing out the wrong chunks. However when I instead try writing out just one array using cubed.to_zarr
I see the expected chunks, i.e.
cubed.to_zarr(rds['air'].data, ts)
Which at least means the bug is in xarray / cubed-xarray rather than in cubed.
Ah good to know it's probably a cubed-xarray bug. Would it be helpful to repost/move the issue there or cross ref it?
Thanks for your patience! Cross-referencing it in a new issue on cubed-xarray could be helpful.
I tracked down the problem - this seems to fix it: https://github.com/pydata/xarray/pull/9326
The Xarray issue has been merged now so it might be worth seeing if it fixes your original issue @norlandrhagen.
Just tried with the main Xarray branch and it worked! 🎈 Thanks for the fix. Is it worth pinning xarray-cubed
to and above the next release of Xarray?
Great!
Is it worth pinning
xarray-cubed
to and above the next release of Xarray?
That might be a good idea.
Is it worth pinning xarray-cubed to and above the next release of Xarray?
Yes definitely. I'm also about to suggest we rename the ChunkManager
to ComputeManger
to better reflect it's updated responsibilities in light of https://github.com/pydata/xarray/pull/9286, which would be a breaking change for cubed-xarray.
Working my way through understanding cubed / cubed-xarray.
I'm trying to get an example working of modifying the chunking of an Xarray dataset and writing it to Zarr. When I roundtrip the Zarr to and from Xarray, it seems like the chunking structure hasn't changed. Is using the
.chunk
method on an Xarray dataset with cubed viable or should I be using rechunk primitive?Roundtrip example using Xarray + dask chunks
Roundtrip example using Xarray + cubed
chunked dataset (rds):
roundtripped dataset (rtds):
🤞 this is an end-of-day brain implementation issue on my end.