Closed tomwhite closed 1 month ago
Here's a jupyter notebook: https://github.com/tomwhite/cubed/blob/f5ece5b068db014f828bf6f3afcf6b05280af52a/examples/pangeo-tem.ipynb
There are a couple of pieces missing. The first is fairly minor, and is that mean
doesn't work yet with NaNs, so I've added skipna=False
just to make progress. It's tracked in https://github.com/pydata/xarray/issues/7243.
The main thing that's missing is a Cubed-compatible path for xarray groupby
. I think it's probably worth using the Flox path in xarray here, but then that will require a Cubed implementation for xarray apply_ufunc
, which is being tracked in #67, and has been started by @TomNicholas in #119. Does this sound right @dcherian, or is there a reason to start by supporting the default internal xarray groupby
?
Yes you'll want apply_ufunc
anyway.
We'll also want to support cubed in flox, which should be mostly easy since I'm using dask
primitives. The main issue is how to deal with the intermediates. ATM Flox uses dictionaries like this
{
"groups": np.array([1, 2, 3, 4]) # shape (ngroups,)
"intermediates": (np.array([4,5,6,7]), np.array([1, 1, 1, 1])) # so tuple(sum, counts) with shape (..., ngroups)
}
for the mean
reduction
That said, for this particular problem I could write the groupby as an efficient indexing + reduction. That's probably better since we'll have a "pure array" version of the problem, which will be easy for experimentation and comparison across libraries
Thanks @dcherian.
For intermediates, Cubed uses structured arrays (e.g. mean
), which seems similar to Flox.
That said, for this particular problem I could write the groupby as an efficient indexing + reduction.
Agree that would be good. Although we'll probably need to add integer array indexing to Cubed - see https://github.com/data-apis/array-api/issues/177 for a bit of discussion about that from a standardisation point of view.
I've added integer array indexing (and take
) to Cubed now (0b59b3241d06b2e57ba4011c538def5e245d7a5d), which should help for group by.
I've update the example notebook at https://github.com/tomwhite/cubed/blob/5eb5f23c25c37ec8634eb35a71711f4eaaffd643/examples/pangeo-tem.ipynb (this needed a few xarray changes). It doesn't use Flox group by yet though.
This issue is to explore running the "Transformed Eulerian Mean diagnostic" example in https://github.com/dcherian/ncar-challenge-suite/blob/main/tem.ipynb using Cubed.
It uses Xarray, so needs https://github.com/pydata/xarray/pull/7019