Open jnnr opened 3 weeks ago
Pitching in here: I really like approach number 4. It's very intuitive.
As for the default_if_empty
: I think we should arrive to a style that makes it explicit that using subsets in a sum or a constraint will ignore cases outside the subsets specified. Mostly to encourage explicit code which is easier to maintain and read.
The implementation in v0.6 was reasonably intuitive, but it was a nightmare to generalise in code!
We could simplify things by adding a groupby
helper function, although it needs some thought. E.g.,:
parameters:
node_grouping:
data: [group_1, group_1, group_2, group_2, group_3]
index: [region1, region1_1, region1_2, region1_3, region2]
dims: [nodes]
flow_cap_node_groups:
data: [1, 2, 3]
index: [group_1, group_2, group_3]
dims: cap_node_groups
```csv region1,group_1 region1_1,group_1 region1_2,group_2 region1_3,group_2 region2,group_3 ``` and ```csv group_1,1 group_2,2 group_3,3 ```
In math:
constraints:
flow_cap_per_node:
foreach: [cap_node_groups, techs]
where: "base_tech=supply"
equations:
- expression: "groupby(flow_cap, node_grouping, how=sum) <= flow_cap_node_groups"
As nice as this is, flow_cap.groupby(node_grouping).sum()
creates the new coordinate "node_grouping" when we need it to be "cap_node_groups". There needs to be some way to inform the function what its new coordinate name should be (in code, it would be flow_cap.groupby(node_grouping.rename("cap_node_groups")).sum()
. Perhaps an attribute attached to node_grouping
? E.g. "groups_on" or something.
BTW, default_if_empty
isn't needed if you add the lookup parameter to the schema with a default value assigned to it (although you don't even need to do that with some kind of "groupby" function).
Also, although this is neater, it does go against the way we're currently setting up dimensions, as referenced by @jnnr in the original post. Ideally we use boolean arrays to map between different dimensions, not the dimension names in the array. This is partly a data storage issue - very large string arrays will bloat the model size in a way that boolean arrays won't. But it's also to reduce likelihood of errors (mispelling of a value in node_grouping
will be difficult to catch until you've already send the model to be built, which is a time consuming part of the process where you want to limit errors occurring).
As nice as this is,
flow_cap.groupby(node_grouping).sum()
creates the new coordinate "node_grouping" when we need it to be "cap_node_groups". There needs to be some way to inform the function what its new coordinate name should be (in code, it would beflow_cap.groupby(node_grouping.rename("cap_node_groups")).sum()
. Perhaps an attribute attached tonode_grouping
? E.g. "groups_on" or something.
Could be a way to go. Regarding the renaming of the grouped parameter: I think the name is correct already. The coordinate is node_grouping
. In this example, node_grouping
is used in a constraint on flow_cap
, but it could be used to constrain other variables. So, I would say, rename the dim of the parameter flow_cap_node_groups
to node_grouping
instead.
Also, although this is neater, it does go against the way we're currently setting up dimensions, as referenced by @jnnr in the original post. Ideally we use boolean arrays to map between different dimensions, not the dimension names in the array. This is partly a data storage issue - very large string arrays will bloat the model size in a way that boolean arrays won't. But it's also to reduce likelihood of errors (mispelling of a value in
node_grouping
will be difficult to catch until you've already send the model to be built, which is a time consuming part of the process where you want to limit errors occurring).
I could probably live with the boolean way of defining the groups. May be less intuitive, but on the other hand more explicit. And if it helps avoid errors, it's good.
Regarding the renaming of the grouped parameter: I think the name is correct already. The coordinate is node_grouping.
Sadly not. The coordinate needs to be cap_node_groups
as that is the dimension over which flow_cap_node_groups
is indexed. When you use a grouper
array to group values in another array what you get is a new dimension on the grouped array that has the name of the grouper
array:
In [2]: import calliope
In [3]: m = calliope.examples.national_scale() # with grouping parameters applied as in an earlier comment on this thread.
In [4]: m.build()
In [5]: m.backend.variables.flow_cap.groupby(m.inputs.node_grouping).sum(min_count=1)
Out[5]:
<xarray.DataArray 'flow_cap' (node_grouping: 3, techs: 8, carriers: 1)> Size: 192B
array([[[nan],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e454350>],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4557d0>],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4548d0>],
[<pyomo.core.expr.numeric_expr.LinearExpression object at 0x153fc3400>],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4572d0>],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e456350>],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4563d0>]],
[[nan],
[nan],
[<pyomo.core.expr.numeric_expr.LinearExpression object at 0x153fc0f70>],
[nan],
[nan],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e454750>],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e455350>],
[nan]],
[[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4560d0>],
[nan],
[nan],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e4574d0>],
[nan],
[nan],
[nan],
[<calliope.backend.pyomo_backend_model.ObjVariable object at 0x11e456a50>]]],
dtype=object)
Coordinates:
* techs (techs) object 64B 'battery' 'ccgt' ... 'region1_to_region2'
* carriers (carriers) object 8B 'power'
* node_grouping (node_grouping) object 24B 'group_1' 'group_2' 'group_3'
^ in this example, we need the node_grouping
coordinate to actually be cap_node_groups
OK, so two options that I would like some votes on @jnnr @tud-mchen6 @FLomb @irm-codebase. I'll post them as individual comments below this for you to add emojis to.
Both assume you have some node grouping that you want to apply a maximum to the sum of all technology capacities in those groups. Constraining values per group:
parameters:
flow_cap_node_groups:
data: [1, 2, 3]
index: [group_1, group_2, group_3]
dims: cap_node_groups
Have a groupby
helper function and define your grouping parameter as string items, e.g.:
```yaml parameters: node_grouping: data: [group_1, group_1, group_2, group_2, group_3] index: [region1, region1_1, region1_2, region1_3, region2] dims: [nodes] ```
```yaml nodes: region1: node_grouping: group_1 ... ```
constraints:
flow_cap_per_node:
foreach: [cap_node_groups, techs]
where: "base_tech=supply"
equations:
- expression: "groupby(flow_cap, node_grouping, how=sum, rename_coord=cap_node_groups) <= flow_cap_node_groups"
The coordinate renaming could be somehow embedded in the parameter attributes before building the optimisation problem, so that it doesn't have to be defined explicitly in the math.
The LHS of the equation would be effectively applying the following python function:
m.backend.variables.flow_cap.groupby(m.inputs.node_grouping.rename("cap_node_groups")).sum(min_count=1)
Have a where
helper function inside math expressions that allows you to mask arrays using a boolean array on-the-fly.
parameters:
node_grouping:
data: True
index: [[group_1, region1], [group_1, region1_1], [group_2, region1_2], [group_2, region1_3], [group_3, region2]]
dims: [cap_node_groups, nodes]
constraints:
flow_cap_per_node:
foreach: [cap_node_groups, techs]
where: "base_tech=supply"
equations:
- expression: "sum(where(flow_cap, node_grouping), over=nodes) <= flow_cap_node_groups"
The LHS of the equation would be effectively applying the following python function:
m.backend.variables.flow_cap.where(m.inputs.node_grouping_2.fillna(False).astype(bool)).sum("nodes", min_count=1)
What can be improved?
Here I summarize some feedback using group constraints in calliope 0.7 with custom math and tabular data, after working with it in a use case and discussing with our colleagues. Good news: It works. There is room for improvement regarding intuitive usability.
How do we define a constraints on the summed capacity for some defined subsets of nodes? The following works, as proposed by @brynpickering:
Define which region belongs to which group in
node_groups.csv
Define the
flow_cap
of each group inflow_cap_node_groups.csv
Define new custom math in
flow_cap_group_constraint.yaml
Put it together by loading the above files in
model.yaml
.Feedback after some discussion.
node_groups.csv
Another, very intuitive way was implemented in the group constraints of calliope v0.6
When we discussed, we said that something like that could offer an intuitive way to parametrize the model, too.
The sum in the constraint uses
default_if_empty
, which is a kind of a hack allowing to sum only over the nodes that are member in the node group. It works, but it is not very intuitive. Can't we allow to slice the variableflow_cap
directly using a subset?Version
v0.7.0.dev3