Closed odscrachel closed 1 year ago
replace reference_year
with temporal
with start and end years per #67 ?
Will exposure.taxonomy
in metadata and putting taxonomy_code
in data provide more flexibility to represent the characteristics of assets using different taxonomies?
In previous versions taxonomy_code
was restricted to using a single string of code as proposed in GED4ALL (example MUR+ADO/HEX:1/RES denoting a single-storey Adobe residential structure - see https://platform.openquake.org/taxtweb/). This string would now appear in the data file, but not metadata.
In Open Exposure Data (OED) Standard and other insurance industry models, these characteristics are separated into multiple columns. For example for the same type: | OccupancyCode | ConstructionCode | NumberOfStoreys | YearBuilt |
---|---|---|---|---|
1050 | 5101 | 1 | 0 |
Using AIR codes, for the same type: | OccupancyCode | ConstructionCode | NumberOfStoreys | YearBuilt |
---|---|---|---|---|
301 | 112 | 1 | 0 |
In the data file, should we be specifying the structure (field names) to use, so the string or numeric values (dependent on the taxonomy used) can be validated? OED / GEM provides codelists that could be used for validation.
If we should specify the structure in the data file, can it be done with a taxonomy_code
object that has option to use a single string value (per GED4ALL) or multiple columns using OED field naming?
OED Spec: https://github.com/OasisLMF/ODS_OpenExposureData/tree/develop/OpenExposureData/Docs
My understanding was that RDLS isn't concerned with how the contents of resources are structured so yes, removing taxonomy_code
from the metadata and leaving it up to data creators to decide how to model the characteristics of assets in their datasets does allow more flexibility.
Regarding validation, in-line with the above understanding, I thought that we were only concerned with validating the RDLS metadata rather than the contents of resources themselves.
replace reference_year with temporal with start and end years per https://github.com/GFDRR/rdl-standard/issues/67 ?
regarding this suggestion, the field will need to be renamed and described. Suggest:
Field name | Title | Description | Field Type |
---|---|---|---|
reference_period |
Reference period | A general reference period, to which the modelled exposure data or exposure scenario refers. | temporal object |
@stufraser1 is this okay?
Re. reference_year
the current PR has raised the following question:
"Unless there is a semantic difference between the concept of temporal coverage and the concept of 'a general reference period to which data refers', I would name this field temporal to be consistent with Resource.temporal. However, adding this field under Exposure is equivalent to adding it at the dataset level, but only for exposure datasets, is there a reason why we wouldn't just have a temporal field as part of the top-level metadata instead so that it can apply to any dataset?"
@stufraser1 do we need this field or can it be dropped?
Often we generate exposure scenarios, to estimate growth in population / urban areas for examples at 2040, 2050, 2060, etc.
It is this we wanted to reflect in reference_year.
We also often need to reflect this when we project risk_data_type
=hazard (e.g. future flood risk under future climate conditions).
This then passes through to the risk_data_type
=loss component, to denote that losses are for the e.g., 2050 or 2080 projection.
If we denote this at the top-level only, they would all be consistent and the information entered once. However, would that restrict us to creating a new dataset for every projection?
In risk_data_type
=hazard do we not enable temporal at event
level, to enable different projections in an event_set
- the same should apply to risk_data_type
=exposure and risk_data_type
=loss I think
Ah, okay I think I see. At the moment exposure
is not an array, e.g. all the fields there must apply across all the resources in the dataset. This is the same for loss
and vulnerability
too.
So trying to think about all of this, we need to reference time at a dataset level for ever risk_data_type
and at a resource level for hazard.event_set
, exposure
and loss
. These times can be real times, future times, or just durations. Different resources within a dataset can have different periods. I think the answer here might be:
Temporal
object at the top level (as per https://github.com/GFDRR/rdl-standard/issues/67)Temporal
object at the resource
level as wellhazard.event_set
, exposure
and loss
as the relevant temporal info will be in the associated top level object (as a summary) and the resource
objects (for a specific resource).reference_year
and year
as it'll only be of relevance at the resource
level and in that case start
and end
can be used.resources.temporal
will be optional so if that info is the same for all resources then it just won't need to be filled in.
For exposure
this will mean we don't need to create a new dataset for each temporal scenario as the specific reference year will be in each resource
.
So using the above example values this will look like:
{
"temporal": {
"start": "2040",
"end": "2060"
},
"risk_data_type": "exposure",
"resources": [
{
"id": "1",
"temporal": {
"start": "2040",
"end": "2040"
}
},
{
"id": "2",
"temporal": {
"start": "2050",
"end": "2050"
}
},
{
"id": "3",
"temporal": {
"start": "2060",
"end": "2060"
}
}
]
}
@stufraser1
We can try this, lets put it in and test it when JSON is complete.
Looks good to me!
What is the context or reason for the change?
Main changes proposed
occupancy
,occupancy_time
,taxonomy_code
from exposure metadata, as this is contained within the data or provided elsewhere. Discussed in #56Link to spreadsheet
What is your proposed change?
The
exposure
object with description 'Information about the modelled exposure (assets and population) that could be affected by the hazard.' with the following fields:category
taxonomy
cost
cost.type
cost.unit
reference_year