calliope-project / calliope

A multi-scale energy systems modelling framework
https://www.callio.pe
Apache License 2.0
276 stars 89 forks source link

Easily define group constraints with custom math #604

Open jnnr opened 3 weeks ago

jnnr commented 3 weeks ago

What can be improved?

Here I summarize some feedback using group constraints in calliope 0.7 with custom math and tabular data, after working with it in a use case and discussing with our colleagues. Good news: It works. There is room for improvement regarding intuitive usability.

How do we define a constraints on the summed capacity for some defined subsets of nodes? The following works, as proposed by @brynpickering:

Define which region belongs to which group in node_groups.csv

region_1,group_1,true
region_2,group_1,true
region_3,group_2,true

Define the flow_cap of each group in flow_cap_node_groups.csv

group_1,120
group_2,80

Define new custom math in flow_cap_group_constraint.yaml

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: sum(flow_cap[carriers=electricity] * default_if_empty(lookup_node_group, 0), over=nodes) <= node_group_flow_cap

Put it together by loading the above files in model.yaml.

cap_node_groups:
   source: tabular-data/scalars/node_groups.csv
   rows: [nodes, cap_node_groups]
   add_dimensions:
     parameters: lookup_node_group

node_groups_flow_cap:
  source: tabular-data/scalars/node_groups_flow_cap.csv
  rows: [node_groups]
  columns: [parameters]

Feedback after some discussion.

  1. We can make a node member in two groups by adding a line.
  2. The csv version seems ok, but the column data=true is confusing. I know this is because how calliope broadcasts this data to the full set for each dimension. But for usability, let's pretend we can drop that, so the file would look like this:

node_groups.csv

region_1,group_1
region_2,group_1
region_2,group_2
region_3,group_2
  1. Another, very intuitive way was implemented in the group constraints of calliope v0.6

    node_group_1: 
    [region_1, region_2]
    node_group_2: 
    [region_3]

    When we discussed, we said that something like that could offer an intuitive way to parametrize the model, too.

  2. The sum in the constraint uses default_if_empty, which is a kind of a hack allowing to sum only over the nodes that are member in the node group. It works, but it is not very intuitive. Can't we allow to slice the variable flow_cap directly using a subset?

Version

v0.7.0.dev3

irm-codebase commented 2 weeks ago

Pitching in here: I really like approach number 4. It's very intuitive.

As for the default_if_empty: I think we should arrive to a style that makes it explicit that using subsets in a sum or a constraint will ignore cases outside the subsets specified. Mostly to encourage explicit code which is easier to maintain and read.

brynpickering commented 2 weeks ago

The implementation in v0.6 was reasonably intuitive, but it was a nightmare to generalise in code! We could simplify things by adding a groupby helper function, although it needs some thought. E.g.,:

parameters:
  node_grouping:
    data: [group_1, group_1, group_2, group_2, group_3]
    index: [region1, region1_1, region1_2, region1_3, region2]
    dims: [nodes]
  flow_cap_node_groups:
    data: [1, 2, 3]
    index: [group_1, group_2, group_3]
    dims: cap_node_groups
Equivalent in CSV

```csv region1,group_1 region1_1,group_1 region1_2,group_2 region1_3,group_2 region2,group_3 ``` and ```csv group_1,1 group_2,2 group_3,3 ```

In math:

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: "groupby(flow_cap, node_grouping, how=sum) <= flow_cap_node_groups"

As nice as this is, flow_cap.groupby(node_grouping).sum() creates the new coordinate "node_grouping" when we need it to be "cap_node_groups". There needs to be some way to inform the function what its new coordinate name should be (in code, it would be flow_cap.groupby(node_grouping.rename("cap_node_groups")).sum(). Perhaps an attribute attached to node_grouping? E.g. "groups_on" or something.

brynpickering commented 2 weeks ago

BTW, default_if_empty isn't needed if you add the lookup parameter to the schema with a default value assigned to it (although you don't even need to do that with some kind of "groupby" function).

brynpickering commented 2 weeks ago

Also, although this is neater, it does go against the way we're currently setting up dimensions, as referenced by @jnnr in the original post. Ideally we use boolean arrays to map between different dimensions, not the dimension names in the array. This is partly a data storage issue - very large string arrays will bloat the model size in a way that boolean arrays won't. But it's also to reduce likelihood of errors (mispelling of a value in node_grouping will be difficult to catch until you've already send the model to be built, which is a time consuming part of the process where you want to limit errors occurring).

jnnr commented 1 week ago

As nice as this is, flow_cap.groupby(node_grouping).sum() creates the new coordinate "node_grouping" when we need it to be "cap_node_groups". There needs to be some way to inform the function what its new coordinate name should be (in code, it would be flow_cap.groupby(node_grouping.rename("cap_node_groups")).sum(). Perhaps an attribute attached to node_grouping? E.g. "groups_on" or something.

Could be a way to go. Regarding the renaming of the grouped parameter: I think the name is correct already. The coordinate is node_grouping. In this example, node_grouping is used in a constraint on flow_cap, but it could be used to constrain other variables. So, I would say, rename the dim of the parameter flow_cap_node_groups to node_grouping instead.

jnnr commented 1 week ago

Also, although this is neater, it does go against the way we're currently setting up dimensions, as referenced by @jnnr in the original post. Ideally we use boolean arrays to map between different dimensions, not the dimension names in the array. This is partly a data storage issue - very large string arrays will bloat the model size in a way that boolean arrays won't. But it's also to reduce likelihood of errors (mispelling of a value in node_grouping will be difficult to catch until you've already send the model to be built, which is a time consuming part of the process where you want to limit errors occurring).

I could probably live with the boolean way of defining the groups. May be less intuitive, but on the other hand more explicit. And if it helps avoid errors, it's good.

brynpickering commented 6 days ago

Regarding the renaming of the grouped parameter: I think the name is correct already. The coordinate is node_grouping.

Sadly not. The coordinate needs to be cap_node_groups as that is the dimension over which flow_cap_node_groups is indexed. When you use a grouper array to group values in another array what you get is a new dimension on the grouped array that has the name of the grouper array:


In [2]: import calliope

In [3]: m = calliope.examples.national_scale()  #  with grouping parameters applied as in an earlier comment on this thread.

In [4]: m.build() 

In [5]: m.backend.variables.flow_cap.groupby(m.inputs.node_grouping).sum(min_count=1)
Out[5]: 
<xarray.DataArray 'flow_cap' (node_grouping: 3, techs: 8, carriers: 1)> Size: 192B
array([[[nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e454350>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4557d0>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4548d0>],
        [<pyomo.core.expr.numeric_expr.LinearExpression object at 0x153fc3400>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4572d0>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e456350>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4563d0>]],

       [[nan],
        [nan],
        [<pyomo.core.expr.numeric_expr.LinearExpression object at 0x153fc0f70>],
        [nan],
        [nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e454750>],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e455350>],
        [nan]],

       [[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4560d0>],
        [nan],
        [nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4574d0>],
        [nan],
        [nan],
        [nan],
        [<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e456a50>]]],
      dtype=object)
Coordinates:
  * techs          (techs) object 64B 'battery' 'ccgt' ... 'region1_to_region2'
  * carriers       (carriers) object 8B 'power'
  * node_grouping  (node_grouping) object 24B 'group_1' 'group_2' 'group_3'

^ in this example, we need the node_grouping coordinate to actually be cap_node_groups

brynpickering commented 6 days ago

OK, so two options that I would like some votes on @jnnr @tud-mchen6 @FLomb @irm-codebase. I'll post them as individual comments below this for you to add emojis to.

Both assume you have some node grouping that you want to apply a maximum to the sum of all technology capacities in those groups. Constraining values per group:

parameters:
  flow_cap_node_groups:
    data: [1, 2, 3]
    index: [group_1, group_2, group_3]
    dims: cap_node_groups
brynpickering commented 6 days ago

Option 1

Have a groupby helper function and define your grouping parameter as string items, e.g.:

Model definition

grouper as top-level parameter

```yaml parameters: node_grouping: data: [group_1, group_1, group_2, group_2, group_3] index: [region1, region1_1, region1_2, region1_3, region2] dims: [nodes] ```

grouper as node/tech-level parameter

```yaml nodes: region1: node_grouping: group_1 ... ```

Math definition

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: "groupby(flow_cap, node_grouping, how=sum, rename_coord=cap_node_groups) <= flow_cap_node_groups"

The coordinate renaming could be somehow embedded in the parameter attributes before building the optimisation problem, so that it doesn't have to be defined explicitly in the math.

The LHS of the equation would be effectively applying the following python function:

m.backend.variables.flow_cap.groupby(m.inputs.node_grouping.rename("cap_node_groups")).sum(min_count=1)
brynpickering commented 6 days ago

Option 2

Have a where helper function inside math expressions that allows you to mask arrays using a boolean array on-the-fly.

Model definition

parameters:
  node_grouping:
    data: True
    index: [[group_1, region1], [group_1, region1_1], [group_2, region1_2], [group_2, region1_3], [group_3, region2]]
    dims: [cap_node_groups, nodes]

Math definition

constraints:
  flow_cap_per_node:
    foreach: [cap_node_groups, techs]
    where: "base_tech=supply"
    equations:
      - expression: "sum(where(flow_cap, node_grouping), over=nodes) <= flow_cap_node_groups"

The LHS of the equation would be effectively applying the following python function:

m.backend.variables.flow_cap.where(m.inputs.node_grouping_2.fillna(False).astype(bool)).sum("nodes", min_count=1)