calliope-project / calliope

A multi-scale energy systems modelling framework
https://www.callio.pe
Apache License 2.0
287 stars 93 forks source link

Unable to save `model.results` directly #684

Open irm-codebase opened 1 week ago

irm-codebase commented 1 week ago

What happened?

Problem

This is related to other problems (#619), but it's closer to an actual user case, so I'll document it here.

Due to our approach of saving everything in the .attrs of the xarray, we cannot directly save model results. This is a reasonable user-case, as saving results is semantically intuitive.

Solution

Not saving dictionaries or arrays in the .attrs

Which operating systems have you used?

Version

v0.7.0.dev4

Relevant log output

model.results.to_netcdf(results_path / "results.nc")
Traceback (most recent call last):
  File "/home/ivanruizmanuel/Documents/git/euro-calliope-modular/build/model/run_save.py", line 14, in <module>
    model.results.to_netcdf(results_path / "results.nc")
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/core/dataset.py", line 2298, in to_netcdf
    return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/backends/api.py", line 1292, in to_netcdf
    _validate_attrs(dataset, invalid_netcdf=invalid_netcdf and engine == "h5netcdf")
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/backends/api.py", line 203, in _validate_attrs
    check_attr(k, v, valid_types)
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/backends/api.py", line 195, in check_attr
    raise TypeError(
TypeError: Invalid value for attr 'scenario': None. For serialization to netCDF files, its value must be of one of the following types: str, Number, ndarray, number, list, tuple
sjpfenninger commented 1 week ago

What is the use case that makes this a bug? In other words, why can't you use model.to_netcdf()?

irm-codebase commented 1 week ago

In this case, I wanted to just save the results to have a smaller file size. I did save with model.to_netcdf().

If you have a stable model build and run many scenarios (e.g., SPORES), saving only the model results, and some metadata on the side, would reduce data use.

Generally, though, just saving the results or inputs makes parsing the data a bit easier. Depending on the mode, some things might be inputs or results (e.g., flow_cap in operate mode). Users might not have the math fully memorized, so just saving the results is a bit more 'human friendly'.

brynpickering commented 2 days ago

Maybe this is why we should simply pickle our data - then it would be clear that you need to load it back in using calliope 😅.

But in all seriousness, our preferred method is that data is always loaded in using calliope (read_netcdf). This way, you automatically get to see what is an input/result, as they are attached as respective attributes of the model object. If we made it easy to save results to file then we'd only come across the opposite problem in future: users trying to load a netcdf file using calliope and then finding it doesn't contain any of the necessary metadata to actually instatiate the model object.

Generally, input data has a much smaller memory footprint than results. Still, in the use-case you have given (individual SPORES runs), it's clearly preferable not to save the input data every single time a new set of results is generated. How about calliope.to_netcdf(include_inputs=False)?

irm-codebase commented 2 days ago

@brynpickering I would not pickle the data. Pickle files are not always translatable between python or library versions. It would decrease stability in the long run 😅😅...

I'd actually prefer to just keep calliope.to_netcdf() without anything. Adding extra things would just decrease maintainability instead of fixing the root of the issue, which is that we are not following netCDF4 specifications.

I suppose it would introduce the chance of someone trying to re-load results... but that is user misconception, not calliope code breaking compatibility with data formats. I do not see it as a big loss, but I get the point.

Should I close this as 'not planned'?

brynpickering commented 2 days ago

I'd actually prefer to just keep calliope.to_netcdf() without anything

Except you also want to add calliope.results.to_netcdf, right?

irm-codebase commented 2 days ago

I meant that I'm ok with just using calliope.to_netcdf() if it means we do not sideline the spec issue with calliope.to_netcdf(include_inputs=False) 🙈.

I can get around it in other ways on my side, and if it has not popped up yet it means most people do not do it.

irm-codebase commented 2 days ago

Just to clarify: we already have calliope.results.to_netcdf(), since results is an xarray. So I was not really requesting us to add any new functions.

This issue was just about making sure using xarray functionality with it works when saving to netCDF.