GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.
https://docs.riskdatalibrary.org/
Creative Commons Attribution Share Alike 4.0 International
13 stars 1 forks source link

[Proposal] Sense check: event.trigger #91

Closed stufraser1 closed 1 year ago

stufraser1 commented 1 year ago

event.trigger is an object used to identify the event.trigger.hazard_type and event.trigger.process_type which triggered the main hazard included in the event.

We should sense-check whether this should sit at the event level as it does now, or at the event_set level instead.

There could be cases where we have an event_set of multiple (historical or synthetic) tsunami events, in which some are triggered by seismic activity, others by landslide, other by volcano. This supports the need to keep it at event level.

On the other hand, a storm surge event set will have the same trigger (ETC or TC) for all events and it would be convenient to identify the trigger once.

On balance, for the flexibility of assigning this to each event where this is a hazard map or historical, I would say keep event.trigger in event object. For an event_set where we do not list every event of the 10,000 in metadata can we use event.trigger anyway to define the trigger applicable to all events?

duncandewhurst commented 1 year ago

For an event_set where we do not list every event of the 10,000 in metadata can we use event.trigger anyway to define the trigger applicable to all events?

If I understood correctly, you mean creating a single 'dummy' event in order to have somewhere to put the trigger, when in fact there are actually 10,000 events. To keep the semantics clear, I think that one event in RDLS should always represent one event in the underlying dataset.

Given the two scenarios that you outline:

I think that points us towards having two fields:

Field Title Description Type
event.trigger Trigger The trigger for this event. object (Trigger)
event_set.triggers Triggers The triggers for the events in this event set. You should only use this field when details of individual events are not included in the RDLS metadata. Otherwise, you should use event.trigger to provide the trigger for each event array (Trigger)
matamadio commented 1 year ago

This seems the most straightforward and intuitive solution.

johcarter commented 1 year ago

can we also have multiple process types at the event set level? For instance, all events could be strong wind (ETC/TCY) potentially triggering storm surge (FSS). Its useful to know at a high level all of the covered process types.

duncandewhurst commented 1 year ago

The Trigger object has both .hazard_type and .process_type properties so the proposal in my previous comment covers both hazard types and process types. @johcarter does that answer your question?

johcarter commented 1 year ago

Any chance of a quick example showing what event_set.triggers might look like when there are two process types (lets say Tropical Cyclone and Storm Surge) applying to the whole event_set please?

stufraser1 commented 1 year ago

In the case of having one event set for each of TC and SS (loss for TC, loss for SS) we might have the following, where the event set of TC losses don't have a trigger, but the event set for storm surge losses have the TC as the trigger. The use of trigger has been included to identify a triggering event or process, not to imply that the triggering event information was combined with that of the triggered event (i.e. describe a combined a loss, or a hazard map with wind speed and storm surge height). Including event sets changes the requirement a bit, because there is then potential to have a dataset which includes >1 hazard type in a single resource file.

This is also true of losses, in fact - where we can have a loss due to cyclone wind, a loss due to storm surge, and a loss due to both combined. It is also increasingly relevant for vulnerability functions which consider more than one hazard - see text image below).

{
  "event_set": [
    {
      "hazard_type": "Strong Wind",
      "process_type": "Tropical Cyclone",
      "trigger": 
        {
          "hazard_type": "",
          "process_type": ""
        }
    },
    {
      "hazard_type": "Coastal Flood",
      "process_type": "Storm Surge",
      "trigger": 
        {
          "hazard_type": "Strong Wind",
          "process_type": "Tropical Cyclone"
        }
    }
  ]
}

What is missing (@johcarter's point) is for an event set where we've got losses combined for TC and SS. This is a feature that was included in earlier version of RDL (in that case the focus was on ensuring this could be captured for vulnerability, but applies equally to event sets):

image

However, this seems to have been dropped in subsequent developments.

To align with this, we could include primary and secondary perils in their own field where an event set included >1 hazard and/or >1 process, e.g., :

{
  "event_set": [
    {
      "hazard_type_primary": "Strong Wind",
      "process_type_primary": "Tropical Cyclone",
      "hazard_type_secondary": "Coastal Flood",
      "process_type_secondary": "Storm Surge",
      "trigger": 
        {
          "hazard_type": "",
          "process_type": ""
        }
    }
  ]
}

Thinking about interoperability, and mapping data between RDLS and say OED... OED accounts for multiple hazards in the perilcode field by semi-colon separated list of codes, e.g., "Windstorm (ETC + TC) with Storm Surge" would be "WTC;WEC;WSS" so there could be an advantage of using a single field to capture multiple hazards.

To facilitate this interoperability, would require hazard_type and process_type to accept an array:

{
  "event_set": [
    {
      "hazard_type": "Strong Wind; Coastal Flood",
      "process_type": "Tropical Cyclone; Storm Surge",
      "trigger": 
        {
          "hazard_type": "",
          "process_type": ""
        }
    }
  ]
}
johcarter commented 1 year ago

Thank you Stu. I think I have a preference for the first example because you can support multiple process types (including more than two) as well as specify a trigger where appropriate. It includes all the information.

The last one is also flexible but think the first is cleaner.

In our footprint resource file, for multi-peril models we would always have multiple processes in the same file because they occur within the same event, as opposed to separate event sets for each process.

I don't think I would go for 'Primary' and 'Secondary' because they are limiting and maybe you don't want to assign these labels to independent hazard processes which occur together.

stufraser1 commented 1 year ago

I don't think I would go for 'Primary' and 'Secondary' because they are limiting and maybe you don't want to assign these labels to independent hazard processes which occur together.

I agree with the problems in terminology.

In our footprint resource file, for multi-peril models we would always have multiple processes in the same file because they occur within the same event, as opposed to separate event sets for each process.

This is the Oasis case, but the majority of data we deal with in development sector are single-hazard hazard maps so we need to handle both, which the first example can do too. When thinking about losses, we have many examples where a loss is single peril or combined perils.

If we used the first example, we would need to be clear that trigger could be used to describe that the trigger event might also be included in the same file as the main hazard/process OR that it is the trigger but doesn't occur in the same file.

Best way to do that @odscjen / @odscrachel / @duncandewhurst ?

odscjen commented 1 year ago

If we used the first example, we would need to be clear that trigger could be used to describe that the trigger event might also be included in the same file as the main hazard/process OR that it is the trigger but doesn't occur in the same file.

In the example given in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1592687884 I read that as an example of the first option, so one event_set describing the trigger event and another event_set describing the event caused by the trigger. If the dataset doesn't contain an event_set that describes the trigger event then the user just wouldn't include that event_set information, e.g. they'd only include:

{
  "event_set": [
    {
      "hazard_type": "Coastal Flood",
      "process_type": "Storm Surge",
      "trigger": 
        {
          "hazard_type": "Strong Wind",
          "process_type": "Tropical Cyclone"
        }
    }
  ]
}

So I don't think there's any particular best way of doing this beyond ensuring the guidance states to only create an event_set for the events that you providing data for, i.e. if the trigger event isn't being described then only state it as the trigger and not as a hazard_type or process_type at the event_set level.

EDIT: realised I'd put in the wrong part of the example! Now the example is what I meant

stufraser1 commented 1 year ago

The question remains what to do if we have a _combined eventset containing the data for the main event and the triggering events. I think we would use

{
  "event_set": [
    {
      "hazard_type": "Coastal Flood",
      "process_type": "Storm Surge",
      "trigger": 
        {
          "hazard_type": "Strong Wind",
          "process_type": "Tropical Cyclone"
        }
    }
  ]
}

And include just one data file. (Having two data files would imply they are separate) This case would only occur if we have an event set file, I think, not individual events, which would more likely separate out the event types in any footprints.

duncandewhurst commented 1 year ago

I think this discussion points to the need to clearly define what an event set is. My understanding from the following diagram from https://docs.riskdatalibrary.org/hazard.html and from the examples given in the issue description was that an event set is a collection of events of the same hazard type:

image

Based on the recent discussion, it sounds like an event set is simply a collection of events, without the constraint of a shared hazard type, and that the purpose of modelling event sets in RDLS is to provide a place to put summary information about the hazard types, process types and triggers covered by the events in the event set when event-level metadata is not provided.

This issue was originally about how to model an event set that contains events with different triggers. That is why my proposal in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1586656145 has event_set.triggers as an array, which seems to have been dropped from the JSON examples shared in later comments.

The recent discussion suggests that we also need to model event sets that contain events with different hazard and process types. To avoid the terminological issues flagged in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1592753589 and to avoid limiting the number of different hazard and process types in an event set, I think that hazard_type and process_type should be arrays, as in the final JSON example in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1592687884.

Taking into account the above, I think that the correct way to model a combined event set that contains the data for the main event and the triggering events is as follows. I've provided draft descriptions/definitions for each field to aid comprehension:

{
  "event_sets": [   // The collections of events described in the dataset.
    {
      "hazard_types": [ // The physical hazard phenomena covered by the event set
        "Coastal Flood",
        "Strong Wind"
      ],
      "process_types": [ // The hazard processes covered by the event set
        "Storm Surge",
        "Tropical Cyclone"
      ],
      "triggers": [ // The causes of the events in the event set
        {
          "hazard_type": "Strong Wind", // The physical hazard phenomena for the trigger
          "process_type": "Tropical Cyclone" // The hazard process for the trigger
        }
      ]
    }
  ]
}

This is in line with @odscjen's recommendation in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1596964674:

So I don't think there's any particular best way of doing this beyond ensuring the guidance states to only create an event_set for the events that you providing data for, i.e. if the trigger event isn't being described then only state it as the trigger and not as a hazard_type or process_type at the event_set level.

The problem with the modelling proposed in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1597698204 is that it isn't possible to distinguish a combined event set that contains the data for the main event and the triggering events from an event set that contains only data for the main event, but for which the triggers are disclosed in event_sets.triggers.

Let me know if I'm barking up the wrong tree here!

johcarter commented 1 year ago

The structure given immediately above by @duncandewhurst looks fine too from my perspective.

Regarding the problem of distinguishing a combined event set in the data, does this problem then move to the meta data describing the resource file containing the combined footprint? Am I right in saying that the property of a resource file "process_type" is a single string, where we would need an array representing more than one process type ? And similarly, "imt" is a string whereas we might need an array of imts?

odscjen commented 1 year ago

@stufraser1 does Duncan's suggestion in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1598034676 cover the cases it needs to? If so can we move it to the Agreed column?

stufraser1 commented 1 year ago

Based on the recent discussion, it sounds like an event set is simply a collection of events, without the constraint of a shared hazard type, and that the purpose of modelling event sets in RDLS is to provide a place to put summary information about the hazard types, process types and triggers covered by the events in the event set when event-level metadata is not provided.

A shared hazard type constraint does exist, and event set is needed to describe the events, even when event level metadat is available, acting as a summary of those events.

In https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1598034676 the example does not define which hazard type the trigger relates to. Perhaps it is enough, for a combined event set where we have the source and trigger together in the same event set, to list them in the hazard type and process type without the trigger, proposing to only use trigger where the event set contains the 'triggered' event.

stufraser1 commented 1 year ago

And similarly, "imt" is a string whereas we might need an array of imts?

For combined event sets, this may be the case.

odscjen commented 1 year ago

So to summarize where I think we're at:

stufraser1 commented 1 year ago

Please see these slides, in which I try to lay out the use cases for footprint data and event sets. https://disasterriskuk-my.sharepoint.com/:p:/g/personal/stuart_disaster-risk_uk/EQ9WMker5wBMgq4DPUpeILcB1Pk4q3R0cMHfjiUlemAJtQ?e=93N5Qz

Depending on the hazards and triggers it is possible that the event_set can contain data from both the main and the trigger pair types.

The final slide gives possible variations of event sets, where two hazards are contained in 2 event set resource files, or both are contained in one.

stufraser1 commented 1 year ago

"a shared hazard type constraint does exist"

In that, an event set describing flood, would only contain events relating to flood - not to earthquake, for example

duncandewhurst commented 1 year ago

@stufraser1 how does that fit with the examples in https://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1609802384, which have multiple hazard types per event set?

stufraser1 commented 1 year ago

@stufraser1 how does that fit with the examples in #91 (comment), which have multiple hazard types per event set?

The event set should always describe the hazard type(s) contained within the event(s), that is what I mean by constraint, but we might not have the same interpretation of 'constraint'?

duncandewhurst commented 1 year ago

Ah, I see. Sorry, I should've been clearer. I meant the constraint that all events in an event set must share the same hazard type, i.e. you can only have one hazard type per event set. It sounds like that is not the case.

stufraser1 commented 1 year ago

OK. Yes, that is not the case.


From: Duncan Dewhurst @.> Sent: Wednesday, June 28, 2023 6:18:06 AM To: GFDRR/rdl-standard @.> Cc: Stuart Fraser @.>; Mention @.> Subject: Re: [GFDRR/rdl-standard] [Proposal] Sense check: event.trigger (Issue #91)

Ah, I see. Sorry, I should've been clearer. I meant the constraint that all events in an event set must share the same hazard type, i.e. you can only have one hazard type per event set. It sounds like that is not the case.

— Reply to this email directly, view it on GitHubhttps://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1610753871, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC7PNYV5ZLOFCBBAVRMZ4PDXNO5A5ANCNFSM6AAAAAAZAPARSA. You are receiving this because you were mentioned.Message ID: @.***>

duncandewhurst commented 1 year ago

@stufraser1 thanks for preparing the slides. If it is important to model the link between triggers and hazards at the event set level, then a more structured model would be necessary with triggers nested within hazards. This modelling also means that we can have multiple intensity measures per event set with a clear link between intensity measures and hazards.

I've prepared two examples based on the final two examples on your final slides. The first example is annotated with draft descriptions for each field. The only part I am not sure is about is whether process should be an array, i.e. whether one hazard type can be related to more than one hazard process. Please take a look and let me know what you think.

If we decide to go with this approach at the event set level, we should re-use the same modelling at event level, although Event.hazard can be an object rather than an array.

Example 1

The event set includes only coastal flooding events. The coastal flooding events were triggered by strong wind events that are not included in the event set.

{
  "hazard": {
    "event_sets": [
      {
        "hazards": [ // The hazards included in this event set.
          {
            "type": "Coastal Flood", // The hazard type for this hazard, from the closed hazard type codelist.
            "process": "Storm Surge", // The process type for this hazard, from the closed hazard process type codelist.
            "intensity_measure": "fl_wd:m", // The metric and unit in which the intensity of this hazard is measured.
            "trigger": { // The trigger for this process
              "type": "Strong Wind", // The hazard type for this trigger, from the closed hazard type codelist.
              "process": "Tropical Cyclone" // The process type for this trigger, from the closed hazard process type codelist.
            }
          }
        ]
      }
    ]
  }
}

Example 2

The event set includes both coastal flooding events and strong wind events. The coastal flooding events were triggered by the strong wind events.

{
  "hazard": {
    "event_sets": [
      {
        "hazards": [
          {
            "type": "Coastal Flood",
            "process": "Storm Surge",
            "intensity_measure": "fl_wd:m",
            "trigger": {
              "type": "Strong Wind",
              "process": "Tropical Cyclone"
            }
          },
          {
            "type": "Strong Wind",
            "process": "Tropical Cyclone",
            "intensity_measure": "PGWS_tcy:km/h"
          }
        ]
      }
    ]
  }
}
odscjen commented 1 year ago

I see a thumbs up from @matamadio for @duncandewhurst's latest suggestion, @stufraser1 are you happy for us to go ahead with this modelling?

stufraser1 commented 1 year ago

Yes I'm happy with this. To Duncsns question yes one hazard type can include >1 process type eg flood can include pluvial and fluvial data


From: odscjen @.> Sent: Friday, July 7, 2023 5:13:32 PM To: GFDRR/rdl-standard @.> Cc: Stuart Fraser @.>; Mention @.> Subject: Re: [GFDRR/rdl-standard] [Proposal] Sense check: event.trigger (Issue #91)

I see a thumbs up from @matamadiohttps://github.com/matamadio for @duncandewhursthttps://github.com/duncandewhurst's latest suggestion, @stufraser1https://github.com/stufraser1 are you happy for us to go ahead with this modelling?

— Reply to this email directly, view it on GitHubhttps://github.com/GFDRR/rdl-standard/issues/91#issuecomment-1625642014, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC7PNYSEQBJTBF3BOFN7ICTXPAYSZANCNFSM6AAAAAAZAPARSA. You are receiving this because you were mentioned.Message ID: @.***>