Open irm-codebase opened 1 week ago
There's a bit of a compromise solution required to handle when configuration is updated on-the-fly. I agree that we need to store the configuration specific to each unique build/solve run, but I prefer the initialised configuration to remain a known entity that you overwrite for a specific build/solve, but that remains as a static underlying config for each independent build/solve.
So, model.build(mode=operate)
and then model.build(force=True)
should not build the same model in operate
mode both times. The second time, it should revert to the base config (probably mode=plan
). Perhaps the best approach would be to store these kwarg config overrides in a separate attribute so that the metadata stored in file reflects the build/solve phases that took place to achieve the results. So if you do lots of re-building/solving of the model, some dictionary like kwarg_config
is kept up-to-date with the latest kwargs and then when you save the model to file, it superimposes the kwarg_config
onto the main config, cementing that config in history.
We had another issue open about results vs inputs (not sure where it is now), in that perhaps we do want results and inputs to be stored in two separate datasets, rather than having a model._model_data
that a user only ever gets filtered views onto. This would help create a consistent approach to not modifying model._model_data
(or, indeed, not having it at all)
init_override
and build_override
.I would prefer not having _override
used as we already use that term a lot. It will create confusion w.r.t. the overrides
key in the YAML.
After some discussion between @brynpickering and I, this is the setup I've come up with. It's meant to enable easier debugging.
The most important feature is the breakdown of .config
. When passing stuff to the backend, the priority order should be ._runtime
> ._config_kwargs
> ._config_overrides
> ._config_file
.
For now, I've added a _
to most things. In reality, I think we do not need it for math
, default
, data
and timestamps
. For config
, a @property
that combines all should be introduced.
Object | Old setup | New setup | Comment | Type | Backend access? | netCDF metadata? |
---|---|---|---|---|---|---|
Model() | .math \| ._model_data.attrs["math"] |
._math |
AttrDict | yes | yes | |
._model_data |
._data |
xarray | yes | yes | ||
.config \| ._model_data.attrs["config"] |
._config_file |
Not expected to be modified whatsoever. | AttrDict | yes | ||
.defaults \| ._model_data.attrs["defaults"] |
._defaults |
AttrDict | yes | yes | ||
._model_def_path |
._def_path |
Path | ||||
._timings \| ._model_data.attrs["timestamp xxxxx"] |
._timestamps |
Both do the same. | AttrDict | yes | ||
._model_def_dict |
Eliminated. Should be local variable. | |||||
._config_override |
Saves overrides given during init | AttrDict | yes | |||
._config_kwargs |
Saves extra **kwargs given during init, build() and solve() | AttrDict | yes | |||
._model_data.attrs["calliope_version_initialized"] \| ._model_data.attrs["scenario"] \| ._model_data.attrs["allow_operate_mode"] \| ._model_data.attrs["applied_additional_math"] |
._runtime |
Contains extra flags/variables we set during execution. | AttrDict | yes |
What happened?
What is happening?
Our current approach of handling data and configuration updates is inconsistent across the code:
model._model_data
(e.g., during initialization using a preexisting file).Similarly, the
model._model_data
object is handled inconsistently across modules. See this example:Lastly, there is another variable with similar behavior, but defined at a different level:
model._model_def_dict
. This means that we are handling similar parameters at different levels.Why is this a problem?
As it stands, you cannot be 100% sure that, for a saved file, the solved data and the stored metadata / configuration correspond. This is a severe issue for replicability!. For example: you can never be sure if your solved file is the original "planning" solution, or a test you ran on operate mode.
If you solve and save different models with different configurations, modes and math files, you should be able to reconstruct how that data was created via the configuration.
How to solve this?
I propose to introduce the following design requirements:
model._model_data
shall not be duplicated or shallow copied by underlying functionality.model.py
should strive to only realize operations withmodel._model_data
, and not modify it unless absolutely necessary (like saving results!).model
level (i.e., no more saving stuff atmodel._model_data.attrs
. This is to encourage better practices and avoid desync issues / losing sight of were things are. When saving files, we can serialize and only then add things tomodel._model_data.attrs
.model
. Is up to users to decide if they save the results or not, but the calliope model file should always tell them what it will do.model._model_data
.This helps us keep the code clean, ensures we better follow netCDF specifications (since it cannot handle lists or dictionaries), and should reduce bugs in the future.
Which operating systems have you used?
Version
v0.7
Relevant log output
No response