Closed irm-codebase closed 4 months ago
Did some extra digging around. The core of the issue is that the netCDF standard does not really consider attributes as anything more than single values per key (https://cfconventions.org/cf-conventions/cf-conventions.html#_attributes).
xarray
does not support saving data in the attributes of datasets beyond single values either, which is why we have our own serialization.
I'm playing around with using pre-defined encoders for this, but so far the produced string is too big for xarray
...
The encoding works perfectly fine, but the saving step does not.
Here's an example:
import codecs
import pickle
import calliope
model = calliope.examples.national_scale()
attrs_orignial = model._model_data.attrs.copy()
pickled_attrs = codecs.encode(pickle.dumps(attrs_orignial), "base64").decode()
model._model_data.attrs.clear()
model._model_data.attrs["encoded"] = pickled_attrs
unpickled_attrs = pickle.loads(codecs.decode(pickled_attrs.encode(), "base64"))
assert attrs_orignial == unpickled_attrs
xarray
provides interfaces for other formats, and it seems like zarr
fits our use-case the best: it fully separates data and metadata, and is able to represent the second. By comparison, netcdf
is less flexible because of the limited typing of attrs (no dictionary or list support).
Saving a full calliope model is as easy as:
import calliope
import xarray
model = calliope.examples.national_scale()
model._model_data.to_netcdf("outputs/model.zarr")
test = xarray.open_zarr("outputs/model.zarr")
In this case, test
contains the full model definition, with 0 errors.
It seems like zarr
is newer (as in, has been around for almost a decade, against 30+ years for netcdf
). It also comes from the geospatial / climate science field:
Summary of summary table: zarr is better than netcdf in every way 😄
netcdf storing single list elements as strings is a bit of a pain. We manage all the other formats that need converting before storing in io.py
(dictionaries, sets, etc.). We could easily just add a serialiser for lists so that it catches single list items. We probably should.
Happy for you to add functionality to convert to .zarr
and to deprecate saving to .nc
. Don't remove it entirely as it should be phased out at a later point - just have a deprecation warning with a recommendation to use .zarr
. And then we just add this serialiser for single element lists so that they are returned as single element lists.
I agree on keeping netCDF for a while, its too popular. I suspect our current serialisation will be enough once #619 is fixed, but it's a toss.
I'll come back to this one afterwards.
What happened?
When saving models into net_cdf files, our serialization algorithm will convert single list elements at the top level into strings.
Replication steps:
Here,
"applied_math"
will be a string, not a list.Which operating systems have you used?
Version
v0.7
Relevant log output
No response