Unwanted spillover and overwriting across cost dimensions

FLomb commented 1 week ago

What happened?

In the new version, the addition of a cost dimension requires extreme care. It is very easy for a new cost dimension to end up overriding existing cost dimensions, even if this was not the intention.

Let's assume we define a new spores_score for the technology ccgt in the national-scale example, like this:

techs:
        ccgt.cost_flow_cap: {'data': 0, 'index': 'spores_score', 'dims': 'costs'}

Then this would overwrite the monetary cost class, which I should have explicitly re-defined with the same cost as in the default techs.yaml file. Now, this may seem little extra effort, but it may be daunting for very large models (such as Euro-Calliope) to ensure that this repetition always happens. And also, it makes the override files unnecessarily large.

Moreover, the model does not provide any warning about this, leading to a model solution in which (without noticing) the ccgt ended up having no monetary cost.

Which operating systems have you used?

[ ] macOS
[ ] Windows
[X] Linux

Version

v0.7.0dev3

Relevant log output

No response

FLomb commented 1 week ago

Perhaps an even more problematic example (but, really, I've had issues with this thing in so many different possible ways) is that we allow people to define costs with no dimensions specified. E.g., in the national example, a ccgt defines its capacity investment costs as cost_flow_cap.data: 750 # USD per kW, with no monetary dimension specified. This works because, by default, the example foresees only a monetary class.

However, if another cost dimension is introduced in the same model, e.g. via overrides or simply by a user deciding to modify the techs.yaml or the locations.yaml files, the above command spills over any other cost dimensions introduced! And the same is true for any other cost item.

This, again, comes with no warning. So, it is extremely easy, especially for casual users, to end up with messed up model data without even noticing.

brynpickering commented 1 week ago

Agreed that a casual user might come across issues here. It isn't ideal.

I'll address your two points separately.

Point 1

The first is having to remember to specify the initial data when defining a new cost class as an override. So:

techs:
  my_tech:
    cost_flow_cap:
      data: 1
      index: monetary
      dims: costs
overrides:
  techs.my_tech.cost_flow_cap:
    data: 0
    index: spores_score
    dims: costs

This will lead to only having the spore score applied. This is because the merging of overrides with original data overrides values. It will not (and should not) know to merge these two together to produce:

techs:
  my_tech:
    cost_flow_cap:
      data: [1, 0]
      index: [monetary, spores_score]
      dims: costs

How to overcome it? For a small model, it might be OK to repeat the monetary data in the override. I agree this isn't ideal for a larger model. This is where loading data from file is much better. You can specify two CSV files:

monetary_costs.csv

techs,parameters
my_tech,cost_flow_cap,1

spores_scores.csv

techs,parameters
my_tech,cost_flow_cap,0

YAML:

data_sources:
  monetary_costs:
    source: monetary_costs.csv
    rows: [techs, parameters]
    add_dimensions:
      costs: monetary
overrides:
  add_spores_score:
    data_sources:
      initial_spores_scores:
        source: spores_scores.csv
        rows: [techs, parameters]
        add_dimensions:
          costs: spores_score

Point 2

The second point regards accidentally broadcasting a cost across multiple index items. In the case of costs, we have this tech_group that sets the data that otherwise gets repeated (I'm not a fan of this approach, but we wanted something to simplify our example models - CSV files would be much easier to handle). This cost_dim_setter defines the index (monetary) and the dimension (costs) for all cost data. Adding cost_flow_cap.data: 1 just fills in the data gap. If you update cost_dim_setter to have an index of [monetary, spores_score] then yes, it will broadcast the data. That's because you are now effectively setting:

techs:
  my_tech:
    cost_flow_cap:
      data: 1
      index: [monetary, spores_score]
      dims: costs

Better would be to not rely on editing cost_dim_setter when you are making these overrides. So for every technology that has a spores_score, you would write it out in full:

overrides:
  add_spores_score:
  techs:
    my_tech:
      cost_flow_cap:
        data: [1, 0]
        index: [monetary, spores_score]
        dims: costs

How do we stop this being an issue in future?

The purpose of the "indexed parameter" definition is to provide users with the power to define multi-dimensional data as they like. It also gives us the exact same syntax wherever this multi-dimensional data is provided. So you can give different efficiencies for different input carriers or you can give different costs for different cost classes, the structure remains the same. It also allows us to open up the possibility for top-level parameters to be defined (those defined under parameters).

we should perhaps not shy away from writing these out in full. Remove cost_dim_setter from the example models and just have the costs written in full. Then it's more obvious as to what you are overwriting in your override.
we should move more to defining data in CSV where we think it is most appropriate for the user to do so. Costs is a big one for this. You could even define your spore scores overrides from CSV and leave the base model as-is (defined in YAML) and you would not face the issues your having.
we should have a warning system of some kind, although it's impossible for us to know when the user is doing something they expect or not, and nobody wants hundreds of warnings on a large model for overrides they have intentionally (and correctly) applied.
we remove the possibility to define a single data point for multiple index values (the second issue). If you have an index of length two then you need a data list of length two. This seems reasonable enough, although I like the idea of being able to define the same value no matter how long your index gets (e.g., you want to set all the spores scores to zero across multiple techs using the top-level parameter key). Perhaps we can simply raise a debug message that the value is being broadcast across those dimensions.

calliope-project / calliope