Closed stufraser1 closed 1 year ago
Attached is the standard format of an 'Occurrence' file in ODS which specifies a list of event occurrences with assigned Period (an integer representing a year) and date fields.
ODS Field names EventId, Period, Year, Month, Day
There may be more than one resource file representing different scenarios of event frequency/clustering/seasonality per event set. Therefore an id and description field for each resource file would be useful in meta data.
The total number of periods per occurrence file is also needed in meta data in order to derive loss metrics. This is because periods with no event occurrences will not appear in the file and the overall range of periods covered is not clear. In ODS this is a meta data field called 'NumberOfPeriods'
E.g. For stochastic event sets, for Period in the range 1 to 10000, then NumberOfPeriods = 10000 For historical event sets, for Period in the range 1951 to 2000, NumberOfPeriods = 50 (Period range for historical event sets may also be 1 to 50 with the 'Year' field holding the real year, what matters is the correct span of years is represented for annual loss metrics) occurrence_lt.csv
This is covered already by event_set.time.span
:
Title | Field name | Description | Type |
---|---|---|---|
Event set time | event_set.time | The modelled scenario may have a known start date, end date, duration, or reference year to which it refers. In some cases, not all of these fields will have known or relevant values. | object |
Event set start time | event_set.time.start | The earliest event start time covered by the modelled scenario(s) contained in the event set. | date-time |
Event set end time | event_set.time.end | The latest event end time covered by the modelled scenario(s) contained in the event set. | date-time |
Event set time span | event_set.time.span | The time period covered by the modelled scenario(s) included in the event set. | string |
Event set reference year | event_set.time.year | A general reference year to which the modelled scenario(s) refers (e.g. '2050'). | string |
Valid question whether event_set.time.span
should be renamed event_set.time.period
:
Title | Field name | Description | Type |
---|---|---|---|
Event set time period | event_set.time.period | The time period covered by the modelled scenario(s) included in the event set. | string |
Yes this would work and I'm indifferent to time.span versus time.period.
The ODS 'NumberOfPeriods' would go into time.span, and for a stochastic event set, the time.year could be 1 indicating the earliest Period in the occurrence file.
To align with https://github.com/GFDRR/rdl-standard/issues/54, https://github.com/GFDRR/rdl-standard/issues/67 and DCAT, I think the field should be named 'temporal'. If possible, we should reuse the modelling too, although we can add fields if needed.
for a stochastic event set, the time.year could be 1 indicating the earliest Period in the occurrence file.
Is this the case where the earliest period in the occurrence file is actually 1AD? If not, what does '1' represent?
A couple more questions:
event_set.time.span
(The time period covered by the modelled scenario(s) included in the event set.) and event_set.time.start
and event_set.time.end
(The earliest event start time and latest event end time in the event set)? It seems like they are semantically equivalent.event_set.time.year
be an array of years to allow for event sets that span more than one calendar year?| Is this the case where the earliest period in the occurrence file is actually 1AD? If not, what does '1' represent?
For stochastic event sets, each period represents a possible sequence of events representing the near term risk, i.e. what could happen over the next year . Its therefore not appropriate to relate them to a historical date, or to start from todays date and extend into the future. And there can be hundreds of thousands of years covered. The time span is needed to specify the total number of periods covered in order to calculate relative frequency/likelihood for outputs, but the start period is simply
Here is an example for historical Cyclones in Bangladesh since 1991, which does have real calendar dates
Thanks for the clarifications!
Based on that, we can reuse the modelling proposed in #67. However, I would replace span
with duration
and use the ISO8601 duration formation, e.g. P50Y for a stochastic event set covering 50 years without reference to a specific calendar dates.
3. In my view it is useful as currently described and can't see a case for it being an array.
How would you populate year
for the Bangladesh example, which covers 1991 to 2019?
Follow up question on the Bangladesh example to make sure I'm understanding things correctly: Why do the early rows conform to Period
being an integer representing a year (per https://github.com/GFDRR/rdl-standard/issues/81#issuecomment-1583332508) but later rows don't?
PERIOD_NO | OCC_YEAR |
---|---|
1 | 1991 |
5 | 1995 |
7 | 1997 |
17 | 2007 |
17 | 2007 |
18 | 2008 |
19 | 2009 |
PERIOD_NO | OCC_YEAR |
---|---|
87 | 2019 |
88 | 1991 |
92 | 1995 |
94 | 1997 |
104 | 2007 |
How would you populate year for the Bangladesh example, which covers 1991 to 2019?
I would use the values 1991 to 2019 in the Year field, but 1 to 29 in the Period field
Follow up question on the Bangladesh example to make sure I'm understanding things correctly: Why do the early rows conform to Period being an integer representing a year (per https://github.com/GFDRR/rdl-standard/issues/81#issuecomment-1583332508) but later rows don't?
(Sorry for the formatting, I don't know all the shortcuts). Sorry the file provided was a bad example, it is a historical ensemble which is an extended set of scenarios of how the historical events might have played out differently had they started in different sea conditions (9 different versions of each). Hence we turned a 29 year historical period into 261 years to explore those different potential outcomes. A purely historical event occurrence set would look how you would expect, and Period starts at 1 and ends at 29. Please see attached for example. hc_oasis_occurrence_historical.csv
I would use the values 1991 to 2019 in the Year field, but 1 to 29 in the Period field
It wouldn't be appropriate to use year
in this case as there isn't a single reference year in the event_set.
Based on the suggestion in https://github.com/GFDRR/rdl-standard/issues/81#issuecomment-1598049005 (and incorporating other accepted changes from other issues) this would actually be:
"event_set" {
"temporal": {
"start": "1991",
"end": "2019",
"duration": "P29Y"
}
}
For a stochastic event_set with no 'real' dates duration
would be the only field used from temporal
.
@johcarter @stufraser1 are you both happy with using event_set.temporal
as detailed in https://github.com/GFDRR/rdl-standard/issues/67#issuecomment-1596845345 to address this at the event_set
level, noting that this same object also appears in resources.temporal
to cover the period information for individual resources addressing
There may be more than one resource file representing different scenarios of event frequency/clustering/seasonality per event set. Therefore an id and description field for each resource file would be useful in meta data.
Consider adding metadata to describe object that describes seasonality/clustering of events
Important in event frequency distributions is seasonality and clustering of multiple events in time, which the return period / event rate info does not capture. One of my suggestions for capturing this in the upcoming ODS/RDL alignment project I am working on with Stu and co, will be an extra resource file which is a list of event occurrences across a span of years. This captures the seasonality and clustering aspect of event frequency within each year. Also, stochastic event catalogues in cat models are too large to be listed in meta-data.
Originally posted by @johcarter in https://github.com/GFDRR/rdl-standard/issues/59#issuecomment-1559690009