[Schema] Should the risk_data_type field be an array?

GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.

https://docs.riskdatalibrary.org/

Creative Commons Attribution Share Alike 4.0 International

13 stars 1 forks source link

[Schema] Should the risk_data_type field be an array? #176

Closed duncandewhurst closed 12 months ago

duncandewhurst commented 1 year ago

@stufraser1 @matamadio, elsewhere I think we've established that a risk dataset can cover more than one type of risk data (hazard, exposure, vulnerability, loss). However, the top-level risk_data_type field is a string so it can only take one value from the risk_data_type codelist. Should we change the field's type to be an array so that it can take multiple values, e.g. hazard and exposure?

matamadio commented 1 year ago

I think the idea was to split datasets by risk data type, hence just one type per dataset.

duncandewhurst commented 1 year ago

Hmm, from the discussion about the spreadsheet template, my understanding was that we wanted to support datasets covering multiple types of risk data.

The change (if needed) is straightforward so we can wait for @stufraser1 to return from leave to get his input.

johcarter commented 1 year ago

It was also my understanding that you could have one or more risk data types in one package. Certainly for a catastrophe model, you would have both hazard and vulnerability together under one dataset, and possibly also some exposure data for testing the model. If risk_data_type is not an array, the dataset info has to be repeated for each risk data type. It would seem more flexible to allow risk_data_type to be an array, as you can still separate the datasets by risk_data_type if you wished.

matamadio commented 1 year ago

Noted. Nothing should hold the user to choose wether or not to include multiple components when they share the exact same common metadata. However, when there are differences e.g. year, spatial extent, etc., then multiple dataset should be preferred.

stufraser1 commented 1 year ago

We would like one or more risk data types in one package, which was a core aim of the standard - to provide more-connected data packages so risk_data_type should be an array. Moving to agreed.