[Meta] Cleaning up Calliope `backend.rerun()` method

brynpickering commented 4 years ago

Problem description

There are a number of issues that are cropping up with our current implementation of the backend.rerun() method: #262 #282 #291 #292

Current implementation

At the moment a user can do the following:

m = calliope.Model(...)
m.run()
m.backend.update_pyomo_param(...) # or `activate_pyomo_constraint`
m2 = m.backend.rerun()

m2 is a Calliope model with inputs and results (i.e. m2._model_data completely describes the model), but without a Pyomo backend. m._model_data remains untouched, but m._backend_model is changed with whatever the user did with the backend interface methods before rerunning.

Possible solutions

A point to note here: the backend interface is there as a way to quickly rerun a model with updated parameters / constraints, without the need to completely rebuild the Pyomo ConcreteModel, saving time and memory. Anything which aims to attach a copy of m._backend_model to m2 will remove the streamlined nature of the backend interface.

Return only the rerun results as an xarray dataset. Users would need to do: new_results = m.backend.rerun() and then it would be up to them to save that as a netcdf using xarray functionality. This is what we used to do, but it comes with its own issues of not cleaning up the dataset to avoid NetCDF errors on saving/loading. To get those cleaning functions, Calliope would need to be involved.
Add the results as a new Calliope object. m.backend.rerun('results_2') would create the object m.results_2. On saving, the user would I guess need to specify which of the attached models they want to save.
Add the results (and updated inputs) as a new scenario on the main model data object m._model_data. m.backend.rerun() would lead to a new 'scenario' dimension on model_data that would allow everything to be saved into one NetCDF, but would be difficult to handle on trying to rerun a model from model_data (model.run()). It might require a user to then specify the scenario on rerunning the model from model_data model.run(scenario=2).
Do as we do at the moment, but clean up the existing issues so that it is more robust on saving/loading, but m2 still doesn't have its own _backend_model object. Would require a number of our warning messages to catch instances where a user expects m2._backend_model to be there, such as in #282.

Thoughts?

brynpickering commented 4 years ago

@timtroendle based on your troubles to date with this functionality, do any of the above options strike you as a good idea, or do you have any of your own?

timtroendle commented 4 years ago

To me it's most intuitive to receive a normal Calliope Model that's why I wouldn't do (1).

The term "scenario" is already used within Calliope and hence I wouldn't overload it with this approach here -- so I wouldn't do (2) or (3). Also: I'd keep it simple and give the responsibility to handle different runs to the user. It's not that difficult to handle runs on the file system layer (run1.nc, run2.nc, ...), and this'll keep Calliope simple.

(4) sounds good to me, but I would do add the backend model. This must not create any memory overhead, if you do not copy the Pyomo model, but simply link it.

--

A few words about mutability and consistency of the several models that exist. Let's say m is the Calliope model generated from the yaml file, and pm is the Pyomo model that results from it (and I ignore that m contains pm in the following). Any direct updates of pm result in an inconsistent state: m and pm don't match anymore. Similarly, m2 (inputs) and pm don't match. From this I draw two conclusions: First, it's not absolutely necessary to create a new Calliope Model, at least not to retain mutability: the model has been changed already! Second, a cleaner approach would change m and pm.

m = calliope.Model(...)
m.run()
m.update_param(...) # changes `m` _and_ `pm`
m2 = m.run(build=False, force=True) # more simply: overwrite existing results in `m`

brynpickering commented 4 years ago

Would you be happy to completely overwrite the original results with a rerun, e.g.:

m = calliope.Model(...)
m.run()
m.update_param(...) # changes `m` _and_ `pm`
m.run(build=False, force=True, warmstart=True) # overwrites information in m.results

Where something named like warmstart is used to specify that you don't want the backend to be rebuilt?

timtroendle commented 4 years ago

Yes, to me that sounds like a pragmatic and clean approach. In most cases you would want to store results, so the full example would look like this:

m = calliope.Model(...)
m.run()
m.to_netcdf("run01.nc")
m.update_param(...) # changes `m` _and_ `pm`
m.run(build=False, force=True, warmstart=True) # overwrites information in m.results
m.to_netcdf("run02.nc")

brynpickering commented 3 years ago

This issue would probably be best solved by building the NetCDF to have another dimension for reruns. As mentioned by @timtroendle this should not be "scenario", but rather something e.g. user-defined. This is functionality that would be best implemented following the release of v0.7.

brynpickering commented 7 months ago

The backend has had an overhaul in v0.7 which addresses these issues. So I will consider this issue closed.

calliope-project / calliope