GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.
https://docs.riskdatalibrary.org/
Creative Commons Attribution Share Alike 4.0 International
14 stars 1 forks source link

[Proposal] Exposure component #62

Closed odscrachel closed 1 year ago

odscrachel commented 1 year ago

What is the context or reason for the change?

Main changes proposed

Link to spreadsheet

What is your proposed change?

The exposure object with description 'Information about the modelled exposure (assets and population) that could be affected by the hazard.' with the following fields:

Field name Title Description Field Type
category Exposure category The category of the assets described in the dataset. String (codelist - Population, Buildings, Infrastructures, Agriculture, Natural environment)
taxonomy Exposure taxonomy scheme The name of the taxonomy scheme used to create descriptive individual asset feature strings within the dataset. String
cost Asset cost The exposure costs associated with specific elements of assets detailed in the dataset. Array of objects
cost.type Cost type The element of the asset that a cost value is being assigned to. String (codelist - Structure, Content, Product, Disruption (Business Interruption))
cost.unit Cost unit The unit in which the asset cost value is given.
reference_year Reference year A general reference year, to which the modelled exposure data or exposure scenario refers (e.g. '2050'). String
stufraser1 commented 1 year ago

replace reference_year with temporal with start and end years per #67 ?

stufraser1 commented 1 year ago

Will exposure.taxonomy in metadata and putting taxonomy_code in data provide more flexibility to represent the characteristics of assets using different taxonomies?

In previous versions taxonomy_code was restricted to using a single string of code as proposed in GED4ALL (example MUR+ADO/HEX:1/RES denoting a single-storey Adobe residential structure - see https://platform.openquake.org/taxtweb/). This string would now appear in the data file, but not metadata.

In Open Exposure Data (OED) Standard and other insurance industry models, these characteristics are separated into multiple columns. For example for the same type: OccupancyCode ConstructionCode NumberOfStoreys YearBuilt
1050 5101 1 0
Using AIR codes, for the same type: OccupancyCode ConstructionCode NumberOfStoreys YearBuilt
301 112 1 0

In the data file, should we be specifying the structure (field names) to use, so the string or numeric values (dependent on the taxonomy used) can be validated? OED / GEM provides codelists that could be used for validation. If we should specify the structure in the data file, can it be done with a taxonomy_code object that has option to use a single string value (per GED4ALL) or multiple columns using OED field naming? OED Spec: https://github.com/OasisLMF/ODS_OpenExposureData/tree/develop/OpenExposureData/Docs

duncandewhurst commented 1 year ago

My understanding was that RDLS isn't concerned with how the contents of resources are structured so yes, removing taxonomy_code from the metadata and leaving it up to data creators to decide how to model the characteristics of assets in their datasets does allow more flexibility.

Regarding validation, in-line with the above understanding, I thought that we were only concerned with validating the RDLS metadata rather than the contents of resources themselves.

odscjen commented 1 year ago

replace reference_year with temporal with start and end years per https://github.com/GFDRR/rdl-standard/issues/67 ?

regarding this suggestion, the field will need to be renamed and described. Suggest:

Field name Title Description Field Type
reference_period Reference period A general reference period, to which the modelled exposure data or exposure scenario refers. temporal object

@stufraser1 is this okay?

odscjen commented 1 year ago

Re. reference_year the current PR has raised the following question:

"Unless there is a semantic difference between the concept of temporal coverage and the concept of 'a general reference period to which data refers', I would name this field temporal to be consistent with Resource.temporal. However, adding this field under Exposure is equivalent to adding it at the dataset level, but only for exposure datasets, is there a reason why we wouldn't just have a temporal field as part of the top-level metadata instead so that it can apply to any dataset?"

@stufraser1 do we need this field or can it be dropped?

stufraser1 commented 1 year ago

Often we generate exposure scenarios, to estimate growth in population / urban areas for examples at 2040, 2050, 2060, etc. It is this we wanted to reflect in reference_year. We also often need to reflect this when we project risk_data_type=hazard (e.g. future flood risk under future climate conditions). This then passes through to the risk_data_type=loss component, to denote that losses are for the e.g., 2050 or 2080 projection.

If we denote this at the top-level only, they would all be consistent and the information entered once. However, would that restrict us to creating a new dataset for every projection? In risk_data_type=hazard do we not enable temporal at event level, to enable different projections in an event_set - the same should apply to risk_data_type=exposure and risk_data_type=loss I think

odscjen commented 1 year ago

Ah, okay I think I see. At the moment exposure is not an array, e.g. all the fields there must apply across all the resources in the dataset. This is the same for loss and vulnerability too.

So trying to think about all of this, we need to reference time at a dataset level for ever risk_data_type and at a resource level for hazard.event_set, exposure and loss. These times can be real times, future times, or just durations. Different resources within a dataset can have different periods. I think the answer here might be:

  1. have the whole Temporal object at the top level (as per https://github.com/GFDRR/rdl-standard/issues/67)
  2. have a Temporal object at the resource level as well
  3. remove all time references from hazard.event_set, exposure and loss as the relevant temporal info will be in the associated top level object (as a summary) and the resource objects (for a specific resource).
  4. remove reference_year and year as it'll only be of relevance at the resource level and in that case start and end can be used.

resources.temporal will be optional so if that info is the same for all resources then it just won't need to be filled in.

For exposure this will mean we don't need to create a new dataset for each temporal scenario as the specific reference year will be in each resource.

So using the above example values this will look like:

{
  "temporal": {
    "start": "2040",
    "end": "2060"
  },
  "risk_data_type": "exposure",
  "resources": [
    {
      "id": "1",
      "temporal": {
        "start": "2040",
        "end": "2040"
      }
    },
    {
      "id": "2",
      "temporal": {
        "start": "2050",
        "end": "2050"
      }
    },
    {
      "id": "3",
      "temporal": {
        "start": "2060",
        "end": "2060"
      }
    }
  ]
}

@stufraser1

stufraser1 commented 1 year ago

We can try this, lets put it in and test it when JSON is complete.

duncandewhurst commented 1 year ago

Looks good to me!