GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.
https://docs.riskdatalibrary.org/
Creative Commons Attribution Share Alike 4.0 International
13 stars 1 forks source link

rdl_schema_0_1.json: update exposure object #101

Closed odscjen closed 1 year ago

odscjen commented 1 year ago

Resolves #62, adds new object in $defs for Cost which will be reused in Vulnerability and Loss.

References Temporal object but this hasn't been created yet.

odscjen commented 1 year ago

The error it's finding is just that Temporal is referenced by it's not been created yet, it's in other PR

duncandewhurst commented 1 year ago

When adding codelist links in schema descriptions, please use the version placeholders introduced in #10, e.g. https://rdl-standard.readthedocs.io/en/{{version}}/data_model/codelists#risk_data_type

odscjen commented 1 year ago

tests/test_schema.py::test_schema_valid[schema/rdl_schema_0.1.json-rdl_schema_0.1.json-data0] schema/rdl_schema_0.1.json is missing "id" in "items/properties" at /$defs/Cost (from /anyOf/1/exposure/properties/cost)

Not sure how to fix this one @duncandewhurst the other error is just complaining about a missing reference that will be changed/removed in other PRs

duncandewhurst commented 1 year ago

The tests expect definitions that appear in an array property's items to have an id field. That isn't always necessary, for example, when the definition has no array properties itself so we could add an exception for this case.

However, given that Cost.unit is always a currency and it is always possible to convert between currencies, I don't see the value of including that field so we could just remove the Cost definition and make Exposure.cost a string array using the cost_type.csv codelist. Sound good?

Regarding the other error, see my comment in https://github.com/GFDRR/rdl-standard/pull/105#issuecomment-1613825081 - I think the same thing has happened here.

duncandewhurst commented 1 year ago

Regarding the other error, see my comment in https://github.com/GFDRR/rdl-standard/pull/105#issuecomment-1613825081 - I think the same thing has happened here.

Having looked into it more, the issue is that the common_exp_category definition was removed in favour of Exposure.category referencing the exposure_category codelist, per my suggestion in https://github.com/GFDRR/rdl-standard/pull/101#discussion_r1244501949.

To avoid the schema being in an incoherent state we should either reinstate the common_exp_category definition until the other places that reference it are updated or we should update those other places to reference the exposure_category codelist as a part of this PR.

odscjen commented 1 year ago

However, given that Cost.unit is always a currency and it is always possible to convert between currencies, I don't see the value of including that field so we could just remove the Cost definition and make Exposure.cost a string array using the cost_type.csv codelist. Sound good?

I can imagine a case where there are costs for 2 types in different currencies though, in that case I think currency still needs to be provided? @stufraser1 thoughts?

duncandewhurst commented 1 year ago

However, given that Cost.unit is always a currency and it is always possible to convert between currencies, I don't see the value of including that field so we could just remove the Cost definition and make Exposure.cost a string array using the cost_type.csv codelist. Sound good?

I can imagine a case where there are costs for 2 types in different currencies though, in that case I think currency still needs to be provided? @stufraser1 thoughts?

In the case of a dataset containing costs in multiple currencies, even if the currencies were specified in the RDLS metadata, the currencies for each cost would also need to be specified in the dataset itself otherwise it would be impossible to know which cost was in which currency,

In the case of a dataset containing costs in a single currency, I had assumed that currencies would also be specified within the dataset itself so. If that's not the case, then there is an argument for putting currency in the metadata.

odscjen commented 1 year ago

I think that both cost.type and cost.unit are needed so I'll add in an id field which will a) future proof the schema incase any future objects need to refer to a cost item b) make flattening easier c) make the test pass

stufraser1 commented 1 year ago

In the case of a dataset containing costs in multiple currencies, even if the currencies were specified in the RDLS metadata, the currencies for each cost would also need to be specified in the dataset itself otherwise it would be impossible to know which cost was in which currency,

I agree. Usual practice to include currency code in the data.