Closed aidanheerdegen closed 2 years ago
@angus-g Seems this was added from a request by @aekiss (#111).
I can see why time_bounds
were added as an attribute, as getvar
returns a xarray.DataArray
, so it can't be simply added as another variable.
Some possible fixes:
xarray.Dataset
from getvar
, in which case a time_bounds
variable can be added.time_bounds
attribute by creating a new xarray.Dataset
which it then returns, and which can be serialisedRelated to the first option of returning an xarray.Dataset
, I wrote some routines to pull a single variable out of a dataset, but also keeping all related variables, so any that are referenced as bounds
or coordinates
in the variable attributes, or the attributes of related variables:
https://github.com/coecms/splitvar/blob/master/splitvar/splitvar.py#L180-L257
If this were the preferred option these functions could be repurposed.
There's merit to all the options, or maybe even some combination of them. It seems easiest to just return an xarray.Dataset
, but I'm not sure how much that'll break things for people who are not expecting that. Otherwise, the direct serialisation of xarray.DataArray
has always struck me as a bit weirder than of xarray.Dataset
, so in that case it seems possible to do some kind of preprocessing.
Another possibility could be to de-xarray the time bounds, as it seems like it would be happy to serialise a bare ndarray
attribute? I'm not sure if/how people are actually using the time bounds, maybe it would be inconvenient to work with without the extra metadata.
I agree that returning an xarray.Dataset
seems the most technically appropriate solution, and I also don't know how much that will affect users, thinking about it, it could be quite "breaky". Most workflows will work directly with the returned variable, but under this scenario they would have to select the variable out of the dataset. Not onerous, but so ubiquitous as to be a real PITA.
I'm not a fan of turning the bounds into an ndarray
: it is "degrading" the utility of the data, stripping it off metadata and such for not much gain.
Ok, how about this for a proposal: remove the automatic insertion of bounds into attributes, but add an option to allow users to request getvar
return an xarray.Dataset
with all related variables included. This will round-trip fine through serialisation/de-serialisation and gives users the option of getting all the related data for a variable, which is arguably actually quite useful and adds utility. At some point in the future this could be the default if it was considered popular enough.
I like your last suggestion @aidanheerdegen. I use time bounds to fix the sea ice time axis but AFAIK it isn't widely used by others, so I don't think we should break everyone's code to support it. It seems much better to me to provide it as an option for anyone needing it.
The change in #111 adds the
time_bounds
variable as an attribute to the returnedxarray.DataArray
fromcc.querying.getvar
.This breaks serialisation with a
TypeError
exception, as the attribute cannot be serialised:Example:
Full stack trace:
```python --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [18], inPing @wghuneke