GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.
https://docs.riskdatalibrary.org/
Creative Commons Attribution Share Alike 4.0 International
16 stars 1 forks source link

[Docs update] Examples to be included #135

Open matamadio opened 1 year ago

matamadio commented 1 year ago

List of examples to be produced and included in Docs

Please add any subject that requires an example (figure, table, other) to be explained properly in the docs.

Aims:

  1. Have examples ready to demonstrate the range of capabilities in the RDLS while promoting uptake
  2. Provide illustration and downloadable template / JSON example for more complex cases
  3. Use more simple / constrained examples than fewer complex examples that show multiple concepts.

Hazard

Exposure - examples to show multiple data types

Vulnerability

Loss

matamadio commented 1 year ago

I can start producing the example data maps. I'll propose a layout to maintain throughout the docs. It would be similar to that used for CCDR docs, please let me know if ok or suggest edits.

matamadio commented 1 year ago

The aim is to have:

Hazard examples

matamadio commented 1 year ago

Exposure examples

Figure example for each data type (random locations):

see spreadsheet and json metadata for Central Asia residential exposure - current future scenarios

matamadio commented 1 year ago

Vulnerability examples

matamadio commented 1 year ago

Loss examples

Note: Use Central Asia SFRARR project / Africa R5 as examples

matamadio commented 1 year ago

Should we attach a download link for each of the datasets shown in the example? E.g. OSM data for the city shown, hazard layer, etc. Should the file be hosted in github in some /downloads/ folder?

duncandewhurst commented 1 year ago

My understanding is that the purpose of the examples is to help readers to understand how RDLS metadata can be used to describe different aspects of risk datasets. I think that we should aim for the text and screenshots for each example to provide sufficient information about the relevant aspects of the datasets. Otherwise, it would be a lot of extra work for readers to download each example and open it in an appropriate software package.

matamadio commented 1 year ago

@odscjen is it ok to provide examples as this (markdown-html), or should it be turned into json?

odscjen commented 1 year ago

Ultimately we'll want to provide them in both markdown-html AND in JSON. For now markdown is fine and once the spreadsheet template and CoVE are up and running we can convert them into JSON as well.

odscjen commented 1 year ago

@matamadio an important thing when creating these examples is to ensure you're using the field titles and codelist values (can use the labels rather than the codes for ease of readin) from the schema and included all of the required fields. Looking at the Hazard examples you've gotten so far there's a few errors:

Deterministic layers examples (maps) to show documentation of index values

Figure Metadata
Title: Global landslide susceptibility layer
Description: Deterministic map of mean landslide hazard occurrence frequency.
Spatial scale: global
Risk Data type: Hazard
Hazard type: Landslide
Source name: LHASA
Source type: model
Analysis type: Deterministic
Frequency distribution: Susceptibility
Calculation method: Inferred
Deterministic frequency intensity measure: Index
Index criteria: Combination of climatology and observed empirical events.
License: Open (CC-BY)

Frequency distribution is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Susceptibility would translate to?) Unless this should actually be a different field?

Describe historical event set - see https://github.com/GFDRR/rdl-standard/issues/81#issuecomment-1607310343 Empirical scenario footprint to show use of GLIDE number and event dates

Figure Metadata
Title: Satellite detected water extent
Description: Satellite-detected surface waters in Shabelle Zone, Somali Region of Ethiopia and Beledweyne District, Hiraan Region of Somalia as observed from a Sentinel-2 image acquired on 14 April 2023 at 07:28 UTC.
Countriest: Somalia; Ethiopia
Risk Data type: Hazard
Hazard type: Flood
Source name: ESA
Source type: model
Analysis type: Empirical
Calculation method: Inferred
Temporal: 2023-09-04 (start); 2023-04-14 (end)
Disaster identifier: FL20230327SOM
License: Open (CC-BY)

Dates should be in YYY-MM-DD format.

Set of hazard maps, to show one of the most common use cases

Figure Metadata
Title: Global flood hazard layer
Description: Probabilistic maps of flood hazard occurrence frequency by return period.
Spatial scale: Global
Risk Data type: Hazard
Hazard type: Flood
Hazard processes: Fluvial flood; Pluvial flood
Source name: FATHOM
Source type: model
Analysis type: Probabilistic
Frequency distribution: Return periods
Occurrence range: once in 10 to 1,000 years
Calculation method: Simulated
Intensity measure: Flood water depth [m]
License: Commercial

'River flood' isn't in the process_type codelist, this should be 'fluvial_flood' as this codelist is closed.

Frequency distribution is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Return periods would translate to?) Suspect this should actually be a different field?

Set with current and future climate projected hazard data to show how temporal objects are used

Figure Metadata
Title: Aqueduct flood hazard maps
Description: Probabilistic maps of coastal flood hazard occurrence frequency by return period.
Spatial scale: Global
Risk Data type: Hazard
Hazard type: Coastal flood
Hazard processes: Storm surge
Source name: Aqueduct
Source type: model
Temporal: 2015, 2030, 2050, 2080
Analysis type: Probabilistic
Frequency distribution: Return periods
Occurrence range: once in 5 to 1,000 years
Calculation method: Simulated
Intensity measure: Flood water depth [m]
License: Open (CCY-BY)

For all the examples where analysis_type = 'Probabilistic' occurrence.probabilistic.probability.span is a required field if you're including any event level data.

matamadio commented 1 year ago

Thanks Jen, fixed examples but missing the last comment: still unsure on how I should indicate occurrence probability in the most common case (return period scenarios 1/n).

This is the case of the flood models where Analysis type: Probabilistic E.g. the fathom dataset example: we have 3 layers in the dataset: 1/n1, 1/n2, 1/n3. The probabilistic range is 1/n1 to 1/n3, and there is no specific period span to specify.

odscjen commented 1 year ago

Sorry that final comment I had misread the schema! span is only required if you're using event.occurrence.probabilistic.probability

I think there are 2 options here:

  1. you just use occurrence_range which sits in event_set to list all 3 probabilities. The description of this field makes it clear that it's only for probabilistic values so that should be clear to the users what the values given are.
  2. each of the 3 values relates to a separate event within the event_set and you put the values in return_period which sits in event.occurrence.probabilistic and you don't use .probability at all.
matamadio commented 1 year ago

For the sake of quick example, I would pick option 1.

duncandewhurst commented 1 year ago

From today's check-in call with @matamadio and @odscrachel, we agreed that @matamadio will prepare examples using the spreadsheet template using only the relevant fields (i.e. not full RDLS metadata files). We can then convert those into JSON format to store in the repository which should give us the flexibility to present them in the documentation as needed (e.g. using field titles rather than JSON paths).

matamadio commented 1 year ago

Spreadsheet example about Fathon global dataset.

Figure Metadata
Title: Global flood hazard layer
Description: Probabilistic maps of flood hazard occurrence frequency by return period.
Spatial extent: Global
Risk Data type: Hazard
Hazard type: Flood
Hazard processes: Fluvial flood; Pluvial flood
Source model: FATHOM
Analysis type: Probabilistic
Occurrence range: once in 10 to 1,000 years
Calculation method: Simulated
Intensity measure: Water depth [m]
License: Commercial
matamadio commented 1 year ago

About the example panel:

duncandewhurst commented 1 year ago

About the example panel:

* would it be possible to switch between (or show together) metadata list (or table) and the underlying json visualisation?

Yep. Given the length of some of the field values, I think it's best to show each in a separate tab. I've tested this out by adding the Fathom hazard example in https://github.com/GFDRR/rdl-standard/pull/196.

Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).

In particular, it would be good to get your feedback on:

  1. Whether to present the tabular format as separate tables.
  2. Whether to include identifiers in the tabular example.

The advantages of using separate tables and including identifiers are:

The downside is that it makes the tabular example longer than presenting all the values in the same table and without identifiers.

If you're happy with the general approach, then I think the best workflow is for you to do the initial preparation of the examples using the spreadsheet template, we can then convert them to JSON to add to the standard repository and the pre-commit script will handle creating the human-friendly CSVs for display in the documentation. For ongoing maintenance, it will be easiest to edit the JSON files directly.

matamadio commented 1 year ago

Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).

Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.

I'll produce additional examples to add in the gdrive folder, nametag _docsample

matamadio commented 1 year ago

See example for exposure: built-up surface (GHS): rdls_exp-GHS_docsample.xlsx

Figure:

immagine

Note 1: different from the real example provided about Thailand, this one indicates the whole global dataset and not a derived national subset. Also attribution is different. Note 2: needs exposure metric specification, see #194. Note 3: there are 2 references for the same resource

matamadio commented 1 year ago

Example for Vulnerability: rdls_vln-FL_JRC

Can be used either for docs snippet and as full example.

Figure (one of many possible):

matamadio commented 1 year ago

New example for probabilistc hazard (Floods and Coastal floods) using open data layers:

rdls_hzd-AQD

rdls_hzd-AQD_docsample

Figure Metadata
Title: Aqueduct Floods Hazard Maps
Description: Probabilistic maps of flood hazard occurrence frequency by return period.
Spatial extent: Global
Risk Data type: Hazard
Hazard type: Flood
Hazard processes: Fluvial floods; Coastal floods
Publisher: Water Research Institute
Project: Aqueduct
Analysis type: Probabilistic
Occurrence range: once in 2 to 1,000 years
Calculation method: Simulated
Intensity measure: Water depth [m]
License: CC-BY-4.0
Resource: Download
duncandewhurst commented 1 year ago

Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).

Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.

I'll produce additional examples to add in the gdrive folder, nametag _docsample

Great!

@odscjen would you be able to pick up work on https://github.com/GFDRR/rdl-standard/pull/196 and add the examples that Mat is preparing? The workflow for each example is:

  1. Convert the example to JSON format and validate it using the commands in https://github.com/GFDRR/rdls-spreadsheet-template/issues/4
  2. Check that the example is semantically correct
  3. Fix any validation or semantic errors
  4. Save the valid JSON file and figure in examples/{component}/{title}
  5. Run ./manage.py pre-commit to generate the CSV files

We can perhaps hold fire on actually adding the examples to docs/reference/schema.md until they are all ready as we might need to think a bit more about layout depending on the number and length of the examples.

odscjen commented 1 year ago

Started working through the new examples, I'll update both the json and spreadsheet in the shared drive where possible, I'll note any corrections in this issue

rdls_exp-GHS_docsample.xlsx

rdls_exp-GHS-THA.xlsx

odscjen commented 1 year ago

rdls_hzd-AQD.xlsx

odscjen commented 1 year ago

rdls_hzd-AQD_docsample.xlsx

odscjen commented 1 year ago

rdls_hzd-FTH-THA.xlsx

rdls_hzd-FTH_docsample.xlsx

odscjen commented 1 year ago

rdls_vln-FL_JRC.xlsx

matamadio commented 1 year ago

Thanks for the feedback, sorry for the missing/wrong input!

rdls_exp-GHS-THA.xlsx: The resource.url links to a page where the default download is Download the global GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0 dataset in a single file which seems to be for 2023 not 2020 as given in resources.temporal. I couldn't figure out how to get that to change to 2020. @matamadio can you make it select 2020 or if not we can just change resources.temporal in the example to be 2023.

URL for this example to be replaced with specific resource data (zip to be hosted in GH docs/_datasamples or similar). The full dataset includes a range of years; this specific subset is for year 2020, for Thailand extent. I could also publish on DDH, but not immediatly (need to wait project completion).

rdls_hzd-AQD.xlsx: event_set id = "2" has no hazards but this is required in the schema. I think what's happen is some confusion with the identifiers in the spreadsheet. In 'hazard_event_sets_hazards' there are 2 hazard objects both linked to event_set 1. But the events in event_set 1 only match the first of these event_set.hazards. BUT the hazards in 'hazard_event_sets_events' for 'event_sets/0/id' 2 don't match the second of the event_set.hazards, with the difference in the hazard.type, in 'hazard_event_sets_hazards' for the second hazard the .type = "flood" but in 'hazard_event_sets_events' the .type = "coastal_flood". @matamadio is the second of the event_set hazards supposed to be linked to the second event_set?

Commenting in the excel file

gazetteerEntries.id should be the actual code from the scheme, so in this case it should be 'TH' as this is the ISO 31-66-2 code for Thailand. So I've moved this from .description and replaced .description with "Thailand'.

Thanks, this needs to be explained in description. Please note this (and other country examples) uses ISO3166-1-alpha2: first level unit (country), 2 letters code.

resource.url is missing. This has been discussed previously (https://github.com/GFDRR/rdls-spreadsheet-template/issues/3#issuecomment-1682027617) so to make the validation pass I've added some dummy url's as this is a commercial product so it's not going to be possible to provide a proper url to the actual data.

Else the url could point to the exising datacatalog page (from where resource can be requested).

events in event_set 1 are missing hazard.type and hazard_process so I've just copied them in from the event_set.hazard values. And done for the same for the other 2 event_sets and given them all local ids no license so I just put in 'commercial' so that it'll pass validation (and this is essentially correct)

Sorry - they are all hazard type: flood; 1 and 2 process is fluvial flood, while 3 is pluvial flood.

missing required from vulnerability, .taxonomy and .spatial.scale - used 'global' for the latter. I had a quick skim through the methodology report for the resource and I couldn't work out what, if any, taxonomy they'd used for classifying the assets so I put it in as 'internal', @matamadio let me know if you know of the actual taxonomy used.

I would put taxonomy as optional here. Originally these were based on Corine Land Cover classes (CLC), but in the end they use their own general taxonomy for splitting curve types. So "internal" is ok.

duncandewhurst commented 1 year ago

rdls_exp-GHS_docsample.xlsx

* @duncandewhurst I think there must be a mistake in the template as `links.rel` is prepopulating with 'describedby' and not 'describedBy'

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

odscjen commented 1 year ago

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

ah, okay, this is getting reported as an error in every JSON conversion

odscjen commented 1 year ago

Else the url could point to the exising datacatalog page (from where resource can be requested).

this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.

duncandewhurst commented 1 year ago

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

ah, okay, this is getting reported as an error in every JSON conversion

Please can you share the data and command(s) that you're using in a new issue? I converted and tested rdls_hzd-AQD.xlsx using the commands in https://github.com/GFDRR/rdls-spreadsheet-template/issues/4 and there were no validation errors.

odscjen commented 1 year ago

@duncandewhurst I used the flatten tool command from that issue but I was using https://www.jsonschemavalidator.net/ for the validation. The schema is definitely the current dev branch schema but I get the following error message

Message: String 'describedby' does not match regex pattern '^(?!(describedby))'. Schema path: https://raw.githubusercontent.com/GFDRR/rdl-standard/0__2__0/schema/rdls_schema.json#/properties/links/items/properties/rel/pattern

duncandewhurst commented 1 year ago

Ah, so as I mentioned in the issue description:

You can also ignore the error relating to the regex pattern for links.rel. I think that's a false positive due to that validator only supporting JSON Schema draft 2019-09 so it should be resolved in CoVE, which uses draft 2020-12.

As expected, there are no errors when validating against draft 2020-12 using check-jsonschema.

stufraser1 commented 1 year ago

Else the url could point to the exising datacatalog page (from where resource can be requested). ... this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.

We need to make sure, when linking to the datacatalog, we are NOT using https://datacatalog.worldbank.org/int/search/..., which is internal only (and the default when Mat, Pierre, I copy a link, but make sure to remove the 'int/' to make it visible externally: https://datacatalog.worldbank.org/search/...

stufraser1 commented 1 year ago

Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.

I agree, following the example for hazard, rather than for exposure looks much better. Easy to tab between each representation of the example, and very clear where to find the examples.

odscjen commented 1 year ago

@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?

stufraser1 commented 1 year ago

Set of hazard maps, to show one of the most common use cases

I also created, as a test, a sheet containing 6 zipped resources containing flood hazard map geotiffs. I created a single event set (its a regional analysis), 6 resources, and one event per country per return period (50 events) and one footprint per event. This differs from the Fathom data example, which has one event per hazard type (3, PLU, FLU Def, FLU Undef) and no footprints. The necessary information gets across to the user either way but I'm not sure which is better. I created it this way because that is how we've packaged it in the dataset on DDH but this is not necessarily the best way, please feel free to suggest a better way - though we're unlikely to reconfigure the dataset on DDH now.

sheet json

duncandewhurst commented 1 year ago

With the exception of RDLS_full_SFRARR_fluvialhazardmaps.json, I've added all of the examples in the JSON conversions folder to the schema reference documentation in https://github.com/GFDRR/rdl-standard/pull/196. I'm sharing a summary of key changes and design decisions below:

I updated the JSON files to reflect the latest version of the schema, but I haven't updated the spreadsheets that were used to generate them. I also corrected one semantic error in spatial.gazetteerEntries in the Central Asia exposure examples, see the commit for details: https://github.com/GFDRR/rdl-standard/pull/196/commits/0273914dda2649c17ebb4cafb4f4ca02e30ecabc. I also put the two Central Asia exposure dataset examples in separate JSON files for ease of comprehension.

To reduce the length of the schema reference page, I've nested the examples with collapsible drop-downs.

image

Where there is more than one example for a component, only the first example is uncollapsed. If there is no figure for an example, it is collapsed. I couldn't find a suitable figure for the Central Asia exposure examples, but I took a screenshot from global flood depth-damage functions PDF to use as a figure for that example:

image

The row titles in the tabular examples now include the titles of intermediary objects so that it is possible to distinguish between, for example, publisher name and creator name (previously they were both titled 'name'):

image

To reduce the amount of screen space taken up by the JSON examples, they are now collapsible, with objects and arrays collapsed by default:

image

matamadio commented 1 year ago

Very nice, thanks. Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?

matamadio commented 1 year ago

@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?

The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway. The one for loss is still to be produced.

duncandewhurst commented 1 year ago

Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?

Addressed in https://github.com/GFDRR/rdl-standard/pull/214.

The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway.

Added in https://github.com/GFDRR/rdl-standard/pull/196.

I don't think there's anything else to do for this issue until the loss example is ready. Let me know if that's wrong!

duncandewhurst commented 1 year ago

@matamadio and @stufraser1 to discuss and prepare loss examples.

matamadio commented 1 year ago

One example of loss data (results of the analysis) from CCDR:

Download THA_RSK.xlsx

This represents one specific country, but the same template applies to any country I've been working on. The dataset consists of one excel file, made of several tabs:

The tabular data for the ADM scores is also provided as geospatial (gpkg). It does not have an explicit loss curve chart, but has all the elements to build it.

@stufraser1 should it fit in the schema in the current state, or do you have any suggestion for better formatting? This is key as Im just now setting the default for the new year analytics.

stufraser1 commented 1 year ago

I would say there are sheets in there that wouldn't normally go into the loss component:

My preference for describing these files in RDL Loss would be to include this as a dataset, and give each sheet as its own resource (.csv), rather than an xlsx book so users can see the list of resource descriptions per dataset, rather than them navigating in many sheets, but I see it could be described in metadata using the existing structure with the workbook as a single resource.

matamadio commented 1 year ago

I have some questions about the loss schema. See simplified CCDR output example in the Gdrive folder.

THA_CCDR_RSK_ADM1.xlsx describes loss output for 2 hazards (river floods and coastal floods) over 2 exp categories. The complete standard ouput would include 5-6 hazards and 3 exposed categories.

Metadata spreadsheet has loss attributes at the dataset level, so I have to create 4 dataset rows.

immagine

But all these information are actually in just one file.

immagine

Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?

stufraser1 commented 1 year ago

Good catch. We want to be able to include multiple loss curves in one dataset, which it would having a 'loss' object under the dataset level. I think this could contain also the contents of loss_cost, since I don't think a layer of nesting for loss cost is required beyond the loss object. I don't think anything else would need nesting: one level should suffice. @odscrachel please could you advise if we can process this quickly / overnight with @duncandewhurst ?

stufraser1 commented 1 year ago

I also have a couple of issues testing with a return period dataset:

Here is the loss metadata file for use in the loss example: json xlsx image: tabulated data, so no image provided

stufraser1 commented 1 year ago

images for exposure examples: Central Asia residential current Central Asia residential projected

duncandewhurst commented 1 year ago

Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?

Each row in the datasets sheet represents a dataset so if there are rows with the same id, the JSON output will be single dataset with the values from the final row, i.e. the values from the earlier rows will be overwritten. Therefore, the Thailand CCR example does point to the need for an array of losses.

I've drafted a PR for the changes proposed in https://github.com/GFDRR/rdl-standard/issues/135#issuecomment-1708577671 and https://github.com/GFDRR/rdl-standard/issues/135#issuecomment-1708617812:

I'll hold off preparing a PR to add the loss examples until we have decided what to do about the schema as if the schema changes the examples will need to be updated. I've also left some comments on the SFRARR example spreadsheet where I think some fields may have been populated incorrectly.


@stufraser1 I've shared my feedback on your other questions and suggestions below.

I also have a couple of issues testing with a return period dataset:

  • loss/impact/unit does not include impact_unit code for monetary losses, so where I've got a monetary asset_loss, I have to leave loss/impact/unit blank

This was discussed at some length in https://github.com/GFDRR/rdl-standard/issues/75, but the conversation in that issue took a different direction so I don't think it was fully resolved.

My preferred approach is not to worry about units and instead to model the kind of quantity being measured (currency, in this case) since users can convert between units of the same quantity kind. That is the approach we settled on for exposure metrics and I think it would make sense to have consistent modelling for exposure metrics and impact metrics. However, that is quite a significant change to consider at this stage for 0.2.

The alternative solution that I proposed was to add an Impact.currency field for monetary losses. The reasons for separating unit and currency are twofold:

  1. Completeness: The complete list of currencies is well-defined and all currencies are of more-or-less equal relevance to RDLS so it makes sense to have a comprehensive (closed) currency codelist. Whereas the complete list of non-currency units is less well defined and many non-currency units are totally irrelevant to RDLS so it makes sense to have a representative (open) codelist of the most relevant units.
  2. Usability: There are very many currencies so it is much harder for a publisher to see which non-currency units are available if they are mixed in with the long list of currencies.

The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.

So the options are:

  1. Do nothing
  2. Add Impact.currency
  3. Try to align the modelling of impact metrics with the modelling of exposure metrics

If needed, we can do option 2 for the 0.2 release and work on option 3 for the next release. Let me know what you want to do.

  • loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?

It seems to me that there is a lot of crossover, but also some differences. For example, the data_calculation_type codelist referenced in loss.impact.base_data_type has a code for 'observed' (Post-event observation data such as post-event damage surveys), which I interpret as indicating "actual" loss data rather than predictions or forecasts. That doesn't fit the semantics of any of the codes in the function_approach codelist referenced in loss.approach. The nearest fit is 'empirical', but it's definition mentions regression analysis, which implies predictions or forecasts rather than "actual" data.

I think that this warrants further investigation, but I don't think we'll resolve it in time for 0.2.

  • spreadsheet template does not contain a link to the gazeteer location scheme, and the link in documentation 'The gazetteer from which the entry is drawn, from the open location gazetteers codelist.' leads to an error.

Regarding the spreadsheet template, I can see a link in the template and in the rdls_template_loss_SFRARR_eqrisk.xlsx (see below). Where is it missing from?

image

Good catch on the broken link in the documentation, this was because some codelists links in the schema included .html, which was working in the schema browser, but not in the schema reference tables, for some reason. I've fixed them in https://github.com/GFDRR/rdl-standard/pull/244.

  • There is a mismatch in loss/cost/0/dimension and loss/cost/0/unit - dimension includes population but the unit requires a currency code.

This is because Cost is intended only to be used for monetary costs, but the codelist for Cost.dimension is shared with Metric.dimension.

  • sources/0/id is tied to dataset ID, so I can't add more than one unique source IDs

In rdls_template_loss_SFRARR_eqrisk.xlsx, it looks like you might've copy-pasted the value from the id column into the sources/0/id column, which has also copied the data validation rules. That's how copy-pasting behaves in Google Sheets and Excel unless you paste values only (Ctrl+Shift+V). Looking at the blank template in the spreadsheet template repository, there are no validation rules on the sources/0/id column.

Here is the loss metadata file for use in the loss example: json xlsx image: tabulated data, so no image provided