GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.
https://docs.riskdatalibrary.org/
Creative Commons Attribution Share Alike 4.0 International
13 stars 1 forks source link

[Schema] Refactor top-level anyOf for compatibiity with Flatten Tool and CoVE, add missing type keywords #167

Closed duncandewhurst closed 1 year ago

duncandewhurst commented 1 year ago

Presently, the schema has a top-level anyOf containing objects for each component: hazard, exposure, vulnerability and loss.

This use of anyOf isn't compatible with Flatten Tool so in order to generate the spreadsheet template shared in https://github.com/GFDRR/rdls-spreadsheet-template/issues/1, I had to pre-process the schema to move the properties for each component into the top-level properties key. Using an anyOf like this is also likely to present problems for error messages in CoVE because of the way the underlying JSON Schema validation library works.

Therefore, I think that we should move the properties for each component into the top-level properties key. Since the property names in each anyOf item are distinct and optional, this change will make no difference to the structure or validity of RDLS data from a publisher or user perspective, it is only a matter of how we organise the schema.

I also noticed that the type keyword is missing from each of hazard, exposure, vulnerability and loss so we can add that in when making the above changes.

I've sketched out how this would look from a schema perspective below. @stufraser1 @matamadio please could you confirm that you're happy with this change?

cc @odscjames @radix0000 for awareness of schema changes.

Current model

{
  "properties": {
    "identifier": {}
    "resources": {}
    ... // Other properties relevant to all datasets
  },
  "anyOf": [
    {
      "properties": {
        "hazard": {
          "properties": {
            // properties relevant to hazard datasets
          }
        }
      }
    },
    {
      "properties": {
        "exposure": {
          "properties": {
            // properties relevant to exposure datasets
          }
        }
      }
    },
    {
      "properties": {
        "vulnerability": {
          "properties": {
            // properties relevant to vulnerability datasets
          }
        }
      }
    },
    {
      "properties": {
        "loss": {
          "properties": {
            // properties relevant to loss datasets
          }
        }
      }
    }
  ]
}

Proposed model

{
  "properties": {
    "identifier": {}
    "resources": {}
    ... // Other properties relevant to all datasets
    "hazard": {
      "properties": {
      // Properties relevant to hazard datasets
    },
    "exposure": {
      "properties": {
        // Properties relevant to exposure datasets
      },
    "vulnerability": {
      "properties": {
        // Properties relevant to vulnerability datasets
      }
    },
    "loss": {
      "properties": {
        // Properties relevant to loss datasets
      }
    }
  }
}
stufraser1 commented 1 year ago

Confirming I'm happy with this, on the understanding it has no real bearing on how we use the JSON file - if we want 1 component we can still remove the others, if we want to include >1 component we still can. Go ahead.

duncandewhurst commented 1 year ago

That's correct @stufraser1. I'll wait for @matamadio to confirm he's happy before making the changes.

matamadio commented 1 year ago

Yes, fine with these changes.