OasisLMF / OasisLMF

Loss modelling framework.
https://oasislmf.github.io/OasisLMF/
BSD 3-Clause "New" or "Revised" License
124 stars 49 forks source link

Correlation group fields are not generated correctly for disaggregated risks #1491

Closed johcarter closed 4 months ago

johcarter commented 5 months ago

Issue Description

In this documentation https://oasislmf.github.io/sections/disaggregation.html#available-1-28 the last paragraph 'Correlation of disaggregated risks' describes the default behaviour of assigning hazard and correlation groups when locations are disaggregated using the IsAggregate field.

However this is not what happens, the default behaviour is still to assign groups using locnumber

in utils/defaults.py line 261-262

DAMAGE_GROUP_ID_COLS = ["PortNumber", "AccNumber", "LocNumber"]
HAZARD_GROUP_ID_COLS = ["PortNumber", "AccNumber", "LocNumber"]

Also there doesn't seem to be support for the internal fields needed to generate the correct logical groupings for disaggregated risks using data_settings in model settings json.

These fields are 'building_id' from gul_summary_map, and 'risk_id' from fm_summary_map

risk_id is the field needed in addition to PortNumber, AccNumber, LocNumber to result in the expected behaviour as we have documented. i.e.

DAMAGE_GROUP_ID_COLS = ["PortNumber", "AccNumber", "LocNumber", "risk_id"]
HAZARD_GROUP_ID_COLS = ["PortNumber", "AccNumber", "LocNumber", "risk_id"]

Using building_id instead of risk_id would result in uncorrelated subrisks in both cases of IsAggregate (0 and 1)

I have added these fields to the internal field list in oasislmf/preparation/gul_inputs.py on this branch;

https://github.com/OasisLMF/OasisLMF/tree/fix/correlation_group_fields

building_id is working as a grouping field on this branch, but risk_id does not work as it is not an index in the relevant internal df KeyError: "['risk_id'] not in index"

I suggest we support both fields for correlation grouping purposes, but do not change the system defaults for correlation groups (PortNumber, AccNumber, LocNumber). The reasons are;

1) it might break backwards compatibility of losses after the default setting change if the hashing function starts to generate different group_ids for the same portfolio. 2) the default 'PortNumber, AccNumber, LocNumber' is a simple rule to understand and model developers can specify alternative behaviour explicitly in model settings and be more confident about what will happen behind the scenes. Documentation will need to be updated.

johcarter commented 5 months ago

Model settings file for piwind to reproduce KeyError: "['risk_id'] not in index" when using "risk_id" as a grouping field

model_settings_issue1491.json