Open matamadio opened 1 year ago
I can start producing the example data maps. I'll propose a layout to maintain throughout the docs. It would be similar to that used for CCDR docs, please let me know if ok or suggest edits.
The aim is to have:
[ ] Deterministic layers examples (maps) to show documentation of index values
Figure | Metadata |
---|---|
Title: Global landslide susceptibility layer Description: Deterministic map of mean landslide hazard occurrence frequency. Spatial extent: Global Risk Data type: Hazard Hazard type: Landslide Source model: LHASA Analysis type: Deterministic Calculation method: Inferred Intensity measure: Index Index criteria: Combination of climatology and observed empirical events. License: Open (CC-BY) |
[ ] Describe historical event set - see https://github.com/GFDRR/rdl-standard/issues/81#issuecomment-1607310343 Empirical scenario footprint to show use of GLIDE number and event dates (combined)
Figure | Metadata |
---|---|
Title: Satellite detected water extent Description: Satellite-detected surface waters in Shabelle Zone, Somali Region of Ethiopia and Beledweyne District, Hiraan Region of Somalia as observed from a Sentinel-2 image acquired on 14 April 2023 at 07:28 UTC. Spatial extent: Somalia; Ethiopia Risk Data type: Hazard Hazard type: Flood Source model: ESA Analysis type: Empirical Calculation method: Inferred Reference period: 2023-4-9 (start); 2023-4-14 (end) GLIDE number: FL20230327SOM License: Open (CC-BY) |
[ ] Set of hazard maps, to show one of the most common use cases (spreadsheet example)
Figure | Metadata |
---|---|
Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source model: FATHOM Analysis type: Probabilistic Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Commercial |
[ ] Set with current and future climate projected hazard data to show how temporal objects are used
Figure | Metadata |
---|---|
Title: Aqueduct flood hazard maps Description: Probabilistic maps of coastal flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Coastal flood Hazard processes: Storm surge Source model: Aqueduct Period(s): 2015, 2030, 2050, 2080 Analysis type: Probabilistic Occurrence range: once in 5 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Open (CCY-BY) |
[ ] Example using Event_set > events > footprint cascade to show a core capability on footprint uncertainty or multiple types of intensity footprint (e.g. EQ event with SA, pga, pgd)
Figure | Metadata |
---|---|
3 earthquake layers with different IMT | Risk Data category: Hazard Source model: ... ... |
[ ] Demonstrating specification of trigger events to show how to code this core capability
Figure | Metadata |
---|---|
TBD | Risk Data category: Hazard Source model: ... ... |
[ ] Stochastic event set Oasis hazard files example - see #44 and OpenQuake example (SFRARR data) to show how tabulated event data, rather than maps, can be stored
Figure | Metadata |
---|---|
Table data | Risk Data category: Hazard Source model: Fathom ... |
Figure example for each data type (random locations):
[ ] Building aggregated data and footprint data
Figure | Metadata |
---|---|
WSF/GHS builtup | Risk Data category: Exposure Source model: ... ... |
Figure | Metadata |
---|---|
OSM footprint | Risk Data category: Exposure Source model: ... ... |
[ ] Land cover data
Figure | Metadata |
---|---|
Title: WorldCover Description: Global land cover map Spatial extent: Global Spatial resolution: 10 m Risk Data type: Exposure Exposure category: Buildings; Natural environment Source model: ESA Reference period: 2020 License: Open (CCY-BY) |
[ ] Infrastructure network data
Figure | Metadata |
---|---|
Central Asia road network exposure | Risk Data category: Exposure Source model: ... ... |
[ ] Population data
Figure | Metadata |
---|---|
Title: Global Human Settlment Layer Description: Global population density map from remote sensing interpretation Spatial extent: Global Spatial resolution: 100 m Risk Data type: Exposure Exposure category: Population Source model: JRC Reference period: 2020 License: Open (CCY-BY) |
[ ] Set with current and future projected data to show how temporal objects are used to describe exposure projections
Figure | Metadata |
---|---|
Central Asia residential exposure - current future scenarios | Title: Central Asia residential building exposure Description: Simulated residential exposure distribution and replacement costs for Central Asia region Spatial extent: Global Spatial resolution: 500 m Risk Data type: Exposure Exposure category: Buildings Source model: RED/OGS Reference period: 2020 and 2080 License: Open (CCY-BY-4.0) |
see spreadsheet and json metadata for Central Asia residential exposure - current future scenarios
[ ] Vulnerability curves examples
Figure | Metadata |
---|---|
TBD | Risk Data category: Vulnerability Source model: ... ... |
[ ] Fatality / mortality curves
Figure | Metadata |
---|---|
Risk Data category: Vulnerability Source model: ... ... |
[ ] Fragility curves / damage functions
Figure | Metadata |
---|---|
Title: Global Flood depth-damage functions Description: Flood impact functions over land cover categories Spatial extent: Global Risk Data type: Vulnerability Primary hazard: Flood Source model: JRC Reference period: 2015 License: Open (CCY-BY) Details:A globally-consistent database of depth-damage curves depicting fractional damage function of water depth as well as maximum damage values for a variety of assets and land use classes. Based on an extensive literature survey concave damage curves have been developed for each continent, while differentiation in flood damage between countries is established by determining maximum damage values at the country scale. |
[ ] Socioeconomic vulnerability Indexes
Figure | Metadata |
---|---|
TBD | Risk Data category: Vulnerability Source model: ... ... |
Note: Use Central Asia SFRARR project / Africa R5 as examples
[ ] Loss dataset linking to E/H/V data used to show how to add 'full linked datasets' - to demo core capability using dataset IDs
Figure | Metadata |
---|---|
TBD | Risk Data category: Loss Source model: ... ... |
[ ] Probabilistic Monetary Losses - show maps and tables - to show a core type of analytical output data
Figure | Metadata |
---|---|
TBD | Risk Data category: Loss Source model: ... ... |
[ ] Probabilistic Non-monetary loss maps and tables to show capability on non-monetary damages
Figure | Metadata |
---|---|
TBD | Risk Data category: Loss Source model: ... ... |
[ ] Probabilistic Event Loss Table / Year Loss Table outputs
Figure | Metadata |
---|---|
TBD | Risk Data category: Loss Source model: ... ... |
[ ] Results of an exposure analysis to show outputs in terms of 'count' rather than loss
Figure | Metadata |
---|---|
TBD | Risk Data category: Loss Source model: ... ... |
[ ] Scenario / empirical Monetary Losses - show maps and tables - to show a core type of analytical output data
Figure | Metadata |
---|---|
Risk Data category: Loss Source model: ... ... |
[ ] Scenario / empirical Non-monetary loss maps and tables to show capability on non-monetary damages
Figure | Metadata |
---|---|
TBD | Risk Data category: Loss Source model: ... ... |
Should we attach a download link for each of the datasets shown in the example? E.g. OSM data for the city shown, hazard layer, etc. Should the file be hosted in github in some /downloads/ folder?
My understanding is that the purpose of the examples is to help readers to understand how RDLS metadata can be used to describe different aspects of risk datasets. I think that we should aim for the text and screenshots for each example to provide sufficient information about the relevant aspects of the datasets. Otherwise, it would be a lot of extra work for readers to download each example and open it in an appropriate software package.
@odscjen is it ok to provide examples as this (markdown-html), or should it be turned into json?
Ultimately we'll want to provide them in both markdown-html AND in JSON. For now markdown is fine and once the spreadsheet template and CoVE are up and running we can convert them into JSON as well.
@matamadio an important thing when creating these examples is to ensure you're using the field titles and codelist values (can use the labels rather than the codes for ease of readin) from the schema and included all of the required fields. Looking at the Hazard examples you've gotten so far there's a few errors:
Deterministic layers examples (maps) to show documentation of index values
Figure | Metadata |
---|---|
Title: Global landslide susceptibility layer Description: Deterministic map of mean landslide hazard occurrence frequency. Spatial scale: global Risk Data type: Hazard Hazard type: Landslide Source name: LHASA Source type: model Analysis type: Deterministic Frequency distribution: Calculation method: Inferred Deterministic frequency intensity measure: Index Index criteria: Combination of climatology and observed empirical events. License: Open (CC-BY) |
Frequency distribution
is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Susceptibility would translate to?) Unless this should actually be a different field?
Describe historical event set - see https://github.com/GFDRR/rdl-standard/issues/81#issuecomment-1607310343 Empirical scenario footprint to show use of GLIDE number and event dates
Figure | Metadata |
---|---|
Title: Satellite detected water extent Description: Satellite-detected surface waters in Shabelle Zone, Somali Region of Ethiopia and Beledweyne District, Hiraan Region of Somalia as observed from a Sentinel-2 image acquired on 14 April 2023 at 07:28 UTC. Countriest: Somalia; Ethiopia Risk Data type: Hazard Hazard type: Flood Source name: ESA Source type: model Analysis type: Empirical Calculation method: Inferred Temporal: 2023-09-04 (start); 2023-04-14 (end) Disaster identifier: FL20230327SOM License: Open (CC-BY) |
Dates should be in YYY-MM-DD format.
Set of hazard maps, to show one of the most common use cases
Figure | Metadata |
---|---|
Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial scale: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source name: FATHOM Source type: model Analysis type: Probabilistic Frequency distribution: Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Flood water depth [m] License: Commercial |
'River flood' isn't in the process_type
codelist, this should be 'fluvial_flood' as this codelist is closed.
Frequency distribution
is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Return periods would translate to?) Suspect this should actually be a different field?
Set with current and future climate projected hazard data to show how temporal objects are used
Figure | Metadata |
---|---|
Title: Aqueduct flood hazard maps Description: Probabilistic maps of coastal flood hazard occurrence frequency by return period. Spatial scale: Global Risk Data type: Hazard Hazard type: Coastal flood Hazard processes: Storm surge Source name: Aqueduct Source type: model Temporal: 2015, 2030, 2050, 2080 Analysis type: Probabilistic Frequency distribution: Occurrence range: once in 5 to 1,000 years Calculation method: Simulated Intensity measure: Flood water depth [m] License: Open (CCY-BY) |
For all the examples where analysis_type
= 'Probabilistic' occurrence.probabilistic.probability.span
is a required field if you're including any event
level data.
Thanks Jen, fixed examples but missing the last comment: still unsure on how I should indicate occurrence probability in the most common case (return period scenarios 1/n).
This is the case of the flood models where Analysis type: Probabilistic E.g. the fathom dataset example: we have 3 layers in the dataset: 1/n1, 1/n2, 1/n3. The probabilistic range is 1/n1 to 1/n3, and there is no specific period span to specify.
Sorry that final comment I had misread the schema! span
is only required if you're using event.occurrence.probabilistic.probability
I think there are 2 options here:
occurrence_range
which sits in event_set
to list all 3 probabilities. The description of this field makes it clear that it's only for probabilistic values so that should be clear to the users what the values given are.event
within the event_set
and you put the values in return_period
which sits in event.occurrence.probabilistic
and you don't use .probability
at all.For the sake of quick example, I would pick option 1.
From today's check-in call with @matamadio and @odscrachel, we agreed that @matamadio will prepare examples using the spreadsheet template using only the relevant fields (i.e. not full RDLS metadata files). We can then convert those into JSON format to store in the repository which should give us the flexibility to present them in the documentation as needed (e.g. using field titles rather than JSON paths).
Spreadsheet example about Fathon global dataset.
Figure | Metadata |
---|---|
Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source model: FATHOM Analysis type: Probabilistic Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Commercial |
About the example panel:
would it be possible to switch between (or show together) metadata list (or table) and the underlying json visualisation?
Figure | Metadata | Json schema |
---|---|---|
Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source model: FATHOM Analysis type: Probabilistic Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Commercial |
Corresponding json |
About the example panel:
* would it be possible to switch between (or show together) metadata list (or table) and the underlying json visualisation?
Yep. Given the length of some of the field values, I think it's best to show each in a separate tab. I've tested this out by adding the Fathom hazard example in https://github.com/GFDRR/rdl-standard/pull/196.
Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).
In particular, it would be good to get your feedback on:
The advantages of using separate tables and including identifiers are:
The downside is that it makes the tabular example longer than presenting all the values in the same table and without identifiers.
If you're happy with the general approach, then I think the best workflow is for you to do the initial preparation of the examples using the spreadsheet template, we can then convert them to JSON to add to the standard repository and the pre-commit script will handle creating the human-friendly CSVs for display in the documentation. For ongoing maintenance, it will be easiest to edit the JSON files directly.
Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).
Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.
I'll produce additional examples to add in the gdrive folder, nametag _docsample
See example for exposure: built-up surface (GHS): rdls_exp-GHS_docsample.xlsx
Figure:
Note 1: different from the real example provided about Thailand, this one indicates the whole global dataset and not a derived national subset. Also attribution is different. Note 2: needs exposure metric specification, see #194. Note 3: there are 2 references for the same resource
Example for Vulnerability: rdls_vln-FL_JRC
Can be used either for docs snippet and as full example.
Figure (one of many possible):
New example for probabilistc hazard (Floods and Coastal floods) using open data layers:
Figure | Metadata |
---|---|
Title: Aqueduct Floods Hazard Maps Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial floods; Coastal floods Publisher: Water Research Institute Project: Aqueduct Analysis type: Probabilistic Occurrence range: once in 2 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: CC-BY-4.0 Resource: Download |
Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).
Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.
I'll produce additional examples to add in the gdrive folder, nametag _docsample
Great!
@odscjen would you be able to pick up work on https://github.com/GFDRR/rdl-standard/pull/196 and add the examples that Mat is preparing? The workflow for each example is:
examples/{component}/{title}
./manage.py pre-commit
to generate the CSV filesWe can perhaps hold fire on actually adding the examples to docs/reference/schema.md
until they are all ready as we might need to think a bit more about layout depending on the number and length of the examples.
Started working through the new examples, I'll update both the json and spreadsheet in the shared drive where possible, I'll note any corrections in this issue
rdls_exp-GHS_docsample.xlsx
links.rel
is prepopulating with 'describedby' and not 'describedBy'.name
that matches the attributions.entity.name
whose .role
is 'author'spatial.bbox
as it was incorrect (should be 4 values only) and as this is a global dataset it's not necessary.referenced_by.authorNames
cost.unit
has a value not from the closed codelist.rdls_exp-GHS-THA.xlsx
links.rel
and semi-colon + spacereferenced_by.datePublished
should be a full date not just the year. Looking at the referenced url there is a full date given so I've replace this field with that value.resource
says that the data is "expressed as the number of square meters" the cost.unit
(once #194 is resolved) should be 'm2' rather than 'AREA' @matamadio let me know if I'd misunderstood thisresource.url
links to a page where the default download is Download the global GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0 dataset in a single file which seems to be for 2023 not 2020 as given in resources.temporal
. I couldn't figure out how to get that to change to 2020. @matamadio can you make it select 2020 or if not we can just change resources.temporal
in the example to be 2023.resources.coordinate_system
is supposed to use the EPSG codes from https://epsg.org. The value given in this example is "54009" which is the value used in the linked url (which states that it's an EPSG code), but this isn't in the EPSG database. It is in https://epsg.io, but this site has both EPSG and ESRI coordinate systems and this one (Mollweide 54009) appears to be an ESRI code not an EPSG code. I've added an issue to the board about how to handle this https://github.com/GFDRR/rdl-standard/issues/199rdls_hzd-AQD.xlsx
links.rel
creator.name
which is required, have used the name
from the attribution
sheet, 'Water Resource Institute'contract_point
which is required, have used Mattia's name and email from other examplesevent_set
id
= "2" has no hazards
but this is required in the schema. I think what's happen is some confusion with the identifiers in the spreadsheet. In 'hazard_event_sets_hazards' there are 2 hazard
objects both linked to event_set
1. But the events
in event_set
1 only match the first of these event_set.hazards
. BUT the hazards
in 'hazard_event_sets_events' for 'event_sets/0/id' 2 don't match the second of the event_set.hazards
, with the difference in the hazard.type
, in 'hazard_event_sets_hazards' for the second hazard
the .type
= "flood" but in 'hazard_event_sets_events' the .type
= "coastal_flood". @matamadio is the second of the event_set hazards supposed to be linked to the second event_set?rdls_hzd-AQD_docsample.xlsx
resources
so I've just copied over from rdls_hzd-AQD.xlsxlinks.rel
creator.name
which is required, have used the name
from the attribution sheet, 'Water Resource Institute'contract_point
which is required, have used Mattia's name and email from other examplesrdls_hzd-FTH-THA.xlsx
gazetteerEntries.id
should be the actual code from the scheme, so in this case it should be 'TH' as this is the ISO 31-66-2 code for Thailand. So I've moved this from .description
and replaced .description
with "Thailand'.links.rel
resource.url
is missing. This has been discussed previously (https://github.com/GFDRR/rdls-spreadsheet-template/issues/3#issuecomment-1682027617) so to make the validation pass I've added some dummy url's as this is a commercial product so it's not going to be possible to provide a proper url to the actual data.events
in event_set
1 are missing hazard.type
and hazard_process
so I've just copied them in from the event_set.hazard
values. And done for the same for the other 2 event_sets
and given them all local id
slicense
so I just put in 'commercial' so that it'll pass validation (and this is essentially correct)rdls_hzd-FTH_docsample.xlsx
resources
so copied from rdls_hzd-FTH-THA.xlsx and updated the dataset.idlinks.rel
rdls_vln-FL_JRC.xlsx
contact_point.name
and creator.name
missing so used Mattia's name for contract_point
and the publisher.name
for creator
spatial
was missing, used .scale
= 'global'vulnerability
, .taxonomy
and .spatial.scale
- used 'global' for the latter. I had a quick skim through the methodology report for the resource
and I couldn't work out what, if any, taxonomy they'd used for classifying the assets so I put it in as 'internal', @matamadio let me know if you know of the actual taxonomy used.Thanks for the feedback, sorry for the missing/wrong input!
rdls_exp-GHS-THA.xlsx: The resource.url links to a page where the default download is Download the global GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0 dataset in a single file which seems to be for 2023 not 2020 as given in resources.temporal. I couldn't figure out how to get that to change to 2020. @matamadio can you make it select 2020 or if not we can just change resources.temporal in the example to be 2023.
URL for this example to be replaced with specific resource data (zip to be hosted in GH docs/_datasamples or similar). The full dataset includes a range of years; this specific subset is for year 2020, for Thailand extent. I could also publish on DDH, but not immediatly (need to wait project completion).
rdls_hzd-AQD.xlsx: event_set id = "2" has no hazards but this is required in the schema. I think what's happen is some confusion with the identifiers in the spreadsheet. In 'hazard_event_sets_hazards' there are 2 hazard objects both linked to event_set 1. But the events in event_set 1 only match the first of these event_set.hazards. BUT the hazards in 'hazard_event_sets_events' for 'event_sets/0/id' 2 don't match the second of the event_set.hazards, with the difference in the hazard.type, in 'hazard_event_sets_hazards' for the second hazard the .type = "flood" but in 'hazard_event_sets_events' the .type = "coastal_flood". @matamadio is the second of the event_set hazards supposed to be linked to the second event_set?
Commenting in the excel file
gazetteerEntries.id should be the actual code from the scheme, so in this case it should be 'TH' as this is the ISO 31-66-2 code for Thailand. So I've moved this from .description and replaced .description with "Thailand'.
Thanks, this needs to be explained in description. Please note this (and other country examples) uses ISO3166-1-alpha2: first level unit (country), 2 letters code.
resource.url is missing. This has been discussed previously (https://github.com/GFDRR/rdls-spreadsheet-template/issues/3#issuecomment-1682027617) so to make the validation pass I've added some dummy url's as this is a commercial product so it's not going to be possible to provide a proper url to the actual data.
Else the url could point to the exising datacatalog page (from where resource can be requested).
events in event_set 1 are missing hazard.type and hazard_process so I've just copied them in from the event_set.hazard values. And done for the same for the other 2 event_sets and given them all local ids no license so I just put in 'commercial' so that it'll pass validation (and this is essentially correct)
Sorry - they are all hazard type: flood; 1 and 2 process is fluvial flood, while 3 is pluvial flood.
missing required from vulnerability, .taxonomy and .spatial.scale - used 'global' for the latter. I had a quick skim through the methodology report for the resource and I couldn't work out what, if any, taxonomy they'd used for classifying the assets so I put it in as 'internal', @matamadio let me know if you know of the actual taxonomy used.
I would put taxonomy as optional here. Originally these were based on Corine Land Cover classes (CLC), but in the end they use their own general taxonomy for splitting curve types. So "internal" is ok.
rdls_exp-GHS_docsample.xlsx
* @duncandewhurst I think there must be a mistake in the template as `links.rel` is prepopulating with 'describedby' and not 'describedBy'
'describedby' is correct. It is an IANA link relation type, which are all lowercase.
'describedby' is correct. It is an IANA link relation type, which are all lowercase.
ah, okay, this is getting reported as an error in every JSON conversion
Else the url could point to the exising datacatalog page (from where resource can be requested).
this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.
'describedby' is correct. It is an IANA link relation type, which are all lowercase.
ah, okay, this is getting reported as an error in every JSON conversion
Please can you share the data and command(s) that you're using in a new issue? I converted and tested rdls_hzd-AQD.xlsx
using the commands in https://github.com/GFDRR/rdls-spreadsheet-template/issues/4 and there were no validation errors.
@duncandewhurst I used the flatten tool command from that issue but I was using https://www.jsonschemavalidator.net/ for the validation. The schema is definitely the current dev branch schema but I get the following error message
Message: String 'describedby' does not match regex pattern '^(?!(describedby))'. Schema path: https://raw.githubusercontent.com/GFDRR/rdl-standard/0__2__0/schema/rdls_schema.json#/properties/links/items/properties/rel/pattern
Ah, so as I mentioned in the issue description:
You can also ignore the error relating to the regex pattern for
links.rel
. I think that's a false positive due to that validator only supporting JSON Schema draft 2019-09 so it should be resolved in CoVE, which uses draft 2020-12.
As expected, there are no errors when validating against draft 2020-12 using check-jsonschema.
Else the url could point to the exising datacatalog page (from where resource can be requested). ... this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.
We need to make sure, when linking to the datacatalog, we are NOT using https://datacatalog.worldbank.org/int/search/..., which is internal only (and the default when Mat, Pierre, I copy a link, but make sure to remove the 'int/' to make it visible externally: https://datacatalog.worldbank.org/search/...
Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.
I agree, following the example for hazard, rather than for exposure looks much better. Easy to tab between each representation of the example, and very clear where to find the examples.
@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?
Set of hazard maps, to show one of the most common use cases
I also created, as a test, a sheet containing 6 zipped resources containing flood hazard map geotiffs. I created a single event set (its a regional analysis), 6 resources, and one event per country per return period (50 events) and one footprint per event. This differs from the Fathom data example, which has one event per hazard type (3, PLU, FLU Def, FLU Undef) and no footprints. The necessary information gets across to the user either way but I'm not sure which is better. I created it this way because that is how we've packaged it in the dataset on DDH but this is not necessarily the best way, please feel free to suggest a better way - though we're unlikely to reconfigure the dataset on DDH now.
With the exception of RDLS_full_SFRARR_fluvialhazardmaps.json
, I've added all of the examples in the JSON conversions folder to the schema reference documentation in https://github.com/GFDRR/rdl-standard/pull/196. I'm sharing a summary of key changes and design decisions below:
I updated the JSON files to reflect the latest version of the schema, but I haven't updated the spreadsheets that were used to generate them. I also corrected one semantic error in spatial.gazetteerEntries
in the Central Asia exposure examples, see the commit for details: https://github.com/GFDRR/rdl-standard/pull/196/commits/0273914dda2649c17ebb4cafb4f4ca02e30ecabc. I also put the two Central Asia exposure dataset examples in separate JSON files for ease of comprehension.
To reduce the length of the schema reference page, I've nested the examples with collapsible drop-downs.
Where there is more than one example for a component, only the first example is uncollapsed. If there is no figure for an example, it is collapsed. I couldn't find a suitable figure for the Central Asia exposure examples, but I took a screenshot from global flood depth-damage functions PDF to use as a figure for that example:
The row titles in the tabular examples now include the titles of intermediary objects so that it is possible to distinguish between, for example, publisher name and creator name (previously they were both titled 'name'):
To reduce the amount of screen space taken up by the JSON examples, they are now collapsible, with objects and arrays collapsed by default:
Very nice, thanks. Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?
@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?
The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway. The one for loss is still to be produced.
Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?
Addressed in https://github.com/GFDRR/rdl-standard/pull/214.
The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway.
Added in https://github.com/GFDRR/rdl-standard/pull/196.
I don't think there's anything else to do for this issue until the loss example is ready. Let me know if that's wrong!
@matamadio and @stufraser1 to discuss and prepare loss examples.
One example of loss data (results of the analysis) from CCDR:
Download THA_RSK.xlsx
This represents one specific country, but the same template applies to any country I've been working on. The dataset consists of one excel file, made of several tabs:
The tabular data for the ADM scores is also provided as geospatial (gpkg). It does not have an explicit loss curve chart, but has all the elements to build it.
@stufraser1 should it fit in the schema in the current state, or do you have any suggestion for better formatting? This is key as Im just now setting the default for the new year analytics.
I would say there are sheets in there that wouldn't normally go into the loss component:
My preference for describing these files in RDL Loss would be to include this as a dataset, and give each sheet as its own resource (.csv), rather than an xlsx book so users can see the list of resource descriptions per dataset, rather than them navigating in many sheets, but I see it could be described in metadata using the existing structure with the workbook as a single resource.
I have some questions about the loss schema. See simplified CCDR output example in the Gdrive folder.
THA_CCDR_RSK_ADM1.xlsx describes loss output for 2 hazards (river floods and coastal floods) over 2 exp categories. The complete standard ouput would include 5-6 hazards and 3 exposed categories.
Metadata spreadsheet has loss attributes at the dataset level, so I have to create 4 dataset rows.
But all these information are actually in just one file.
Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?
Good catch. We want to be able to include multiple loss curves in one dataset, which it would having a 'loss' object under the dataset level. I think this could contain also the contents of loss_cost, since I don't think a layer of nesting for loss cost is required beyond the loss object. I don't think anything else would need nesting: one level should suffice. @odscrachel please could you advise if we can process this quickly / overnight with @duncandewhurst ?
I also have a couple of issues testing with a return period dataset:
impact_unit
code for monetary losses, so where I've got a monetary asset_loss, I have to leave loss/impact/unit blankloss/cost/0/dimension
and loss/cost/0/unit
- dimension includes population but the unit requires a currency code.Here is the loss metadata file for use in the loss example: json xlsx image: tabulated data, so no image provided
images for exposure examples: Central Asia residential current Central Asia residential projected
Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?
Each row in the datasets
sheet represents a dataset so if there are rows with the same id
, the JSON output will be single dataset with the values from the final row, i.e. the values from the earlier rows will be overwritten. Therefore, the Thailand CCR example does point to the need for an array of losses.
I've drafted a PR for the changes proposed in https://github.com/GFDRR/rdl-standard/issues/135#issuecomment-1708577671 and https://github.com/GFDRR/rdl-standard/issues/135#issuecomment-1708617812:
loss
an arrayloss.cost
an object@stufraser1 @matamadio I will leave it up to you to decide if you want to merge this PR for inclusion in the 0.2 release or leave it for later. My sense is that modelling for loss metadata warrants further exploration (I'll open an issue), but that the changes in the PR are an improvement over the current model so I would merge it.
I'll hold off preparing a PR to add the loss examples until we have decided what to do about the schema as if the schema changes the examples will need to be updated. I've also left some comments on the SFRARR example spreadsheet where I think some fields may have been populated incorrectly.
@stufraser1 I've shared my feedback on your other questions and suggestions below.
I also have a couple of issues testing with a return period dataset:
- loss/impact/unit does not include
impact_unit
code for monetary losses, so where I've got a monetary asset_loss, I have to leave loss/impact/unit blank
This was discussed at some length in https://github.com/GFDRR/rdl-standard/issues/75, but the conversation in that issue took a different direction so I don't think it was fully resolved.
My preferred approach is not to worry about units and instead to model the kind of quantity being measured (currency, in this case) since users can convert between units of the same quantity kind. That is the approach we settled on for exposure metrics and I think it would make sense to have consistent modelling for exposure metrics and impact metrics. However, that is quite a significant change to consider at this stage for 0.2.
The alternative solution that I proposed was to add an Impact.currency
field for monetary losses. The reasons for separating unit
and currency
are twofold:
The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.
So the options are:
Impact.currency
If needed, we can do option 2 for the 0.2 release and work on option 3 for the next release. Let me know what you want to do.
- loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?
It seems to me that there is a lot of crossover, but also some differences. For example, the data_calculation_type codelist referenced in loss.impact.base_data_type
has a code for 'observed' (Post-event observation data such as post-event damage surveys), which I interpret as indicating "actual" loss data rather than predictions or forecasts. That doesn't fit the semantics of any of the codes in the function_approach codelist referenced in loss.approach
. The nearest fit is 'empirical', but it's definition mentions regression analysis, which implies predictions or forecasts rather than "actual" data.
I think that this warrants further investigation, but I don't think we'll resolve it in time for 0.2.
- spreadsheet template does not contain a link to the gazeteer location scheme, and the link in documentation 'The gazetteer from which the entry is drawn, from the open location gazetteers codelist.' leads to an error.
Regarding the spreadsheet template, I can see a link in the template and in the rdls_template_loss_SFRARR_eqrisk.xlsx (see below). Where is it missing from?
Good catch on the broken link in the documentation, this was because some codelists links in the schema included .html
, which was working in the schema browser, but not in the schema reference tables, for some reason. I've fixed them in https://github.com/GFDRR/rdl-standard/pull/244.
- There is a mismatch in
loss/cost/0/dimension
andloss/cost/0/unit
- dimension includes population but the unit requires a currency code.
This is because Cost
is intended only to be used for monetary costs, but the codelist for Cost.dimension
is shared with Metric.dimension
.
- sources/0/id is tied to dataset ID, so I can't add more than one unique source IDs
In rdls_template_loss_SFRARR_eqrisk.xlsx, it looks like you might've copy-pasted the value from the id
column into the sources/0/id
column, which has also copied the data validation rules. That's how copy-pasting behaves in Google Sheets and Excel unless you paste values only (Ctrl+Shift+V). Looking at the blank template in the spreadsheet template repository, there are no validation rules on the sources/0/id
column.
Here is the loss metadata file for use in the loss example: json xlsx image: tabulated data, so no image provided
List of examples to be produced and included in Docs
Please add any subject that requires an example (figure, table, other) to be explained properly in the docs.
Aims:
Hazard
Exposure - examples to show multiple data types
Vulnerability
Loss