Closed semvijverberg closed 2 years ago
Found some similar issues after a bit of googling https://stackoverflow.com/questions/54848671/using-xarray-groupby-bins-results-in-coordinates-that-are-an-object-and-cant-be. Apparently we are not the only one. There is already an issue in xarray about supporting interval index
https://github.com/pydata/xarray/issues/2847.
Currently in our implementation, the interval datatype is object, which is not supported. But from the discussions in those issues, it seems xarray does support certain interval types. Maybe we can try to play a bit with different interval types generated by pandas? (e.g. https://pandas.pydata.org/docs/reference/api/pandas.interval_range.html)
Seeing as we do not really use the intervals ourselves after applying the calendar, we can just decide to not include them in the DataArray.
If we do want to keep them, we can split the intervals up into two coordinates; interval_left
and interval_right
(or interval_start
and interval_end
. These will just be timestamps and can therefore be stored in a netCDF.
Perhaps we could convert them to normal times (either 'left', 'center', or 'right'), and include a bounds
array of shape 2n
, where n
is the length of the calendar. This is how it's done in the CF/CMOR standards.
Just some interesting things to share after playing with it. When export to netcdf format, we can choose the encoding for certain coordinate, e.g. tp_aggr.to_netcdf("./tp_aggr.nc", encoding={"interval":{"dtype":'<U8'}})
). Given that after resampling the saved pandas intervals are changed to object
type by xarray automatically, I tried to manually set tp_aggr.to_netcdf("./tp_aggr.nc", encoding={"time":{"dtype":pd.IntervalDtype(subtype='datetime64[ns]', closed='both')}})
, but xarray still complains and apparently it doesn't support pandas intervals.
I think what @Peter9192 suggests is the best solution, which is also similar to the solution given by the developer of xarray in their issue https://github.com/pydata/xarray/issues/2847#issuecomment-475918645.
When I try to store a resample xr.DataArray:
I get the following error:
For completeness, this is the whole dataset: