GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.
https://docs.riskdatalibrary.org/
Creative Commons Attribution Share Alike 4.0 International
16 stars 1 forks source link

[Docs update] Write content for Guides/RDL metadata page #149

Closed odscjen closed 1 year ago

odscjen commented 1 year ago

Currently no content for the RDL metadata page in the Guidance section. This shall be created as part of work on the development of the validation and spreadsheet tools.

odscjen commented 1 year ago

Based on some of the examples that have been created recently (see https://github.com/GFDRR/rdls-spreadsheet-template/issues/3#issuecomment-1682027617) do we want to have a specific section of guidance specifying any differences in the use of RDLS depending on if the user is publishing data to an open data catalogue or to internal only or access-restricted catalogues?

duncandewhurst commented 1 year ago

Sounds good. Do you want to propose some content?

We can perhaps include some guidance to explain that certain fields like resource URLs might only be populated once data has actually been added to a catalogue.

odscjen commented 1 year ago

Great, I'll make a start on that today (Friday) or Monday

odscrachel commented 1 year ago

Some ideas of what a skeleton might look like, although some of the topics under generating a json file are generic and could warrant a more general advice heading:

How to publish RDLS metadata

Adoption of the metadata schema

Metadata enables datasets to be found by human and machine searches, and so users can easily identify the dataset contents. It is strongly encouraged that any risk dataset being uploaded online has metadata prepared and uploaded with it.

The Risk Data Library Standard defines metadata in JSON format, but it can be translated into table (csv/excel). WIP

How to assign a dataset identifier

See #184

Creating a JSON file

Link to package schema - see #203 Clarify the purpose of links - see #187 Ensuring non metadata is included within the datasets Resource URLs see comment

Validate your metadata

See #203

Using the RDLS spreadsheet input template

Possibly add the Read me contents

Publishing to an open data catalogue

World Bank data catalogue

File sharing Tips and specific advice for DDH

Sharing your data

This would be helpful to include some pointers on sharing/promoting data to encourage use.

Publishing to an internal or access-restricted catalogue

duncandewhurst commented 1 year ago

Thanks, @odscrachel.

For ease of editing and review, I've copied the skeleton into a Google Doc and restructured it into an overview of the process for publishing RDLS metadata (prepare, check, publish) and how-to guides for specific topics.

@odscjen I've assigned you a couple of comments for sections relevant to your suggestion in https://github.com/GFDRR/rdl-standard/issues/149#issuecomment-1682028428

odscjen commented 1 year ago

Noting here, a comment from https://github.com/GFDRR/rdl-standard/pull/207#pullrequestreview-1595527033 to include in the guidance that when creating JSON the various coordinate fields a comma should be used to separate values not a semi-colon as in the spreadsheet template

odscjen commented 1 year ago

Linking in https://github.com/GFDRR/rdl-standard/issues/56, as part of the metadata or not review a lot of fields were removed from vulnerability and it was mentioned that it should be mentioned someone in the guidance that users should still include these values in their data even if they're not given in the RDLS metadata. Looking at how the guidance is currently structured it's unclear where this would fit in. For now I've added a section titled 'Non-RDLS metadata' under 'Prepare your metadata'.

duncandewhurst commented 1 year ago

Linking in #56, as part of the metadata or not review a lot of fields were removed from vulnerability and it was mentioned that it should be mentioned someone in the guidance that users should still include these values in their data even if they're not given in the RDLS metadata. Looking at how the guidance is currently structured it's unclear where this would fit in. For now I've added a section titled 'Non-RDLS metadata' under 'Prepare your metadata'.

I think this is probably best addressed by adding a sentence at the end of the second paragraph of https://rdl-standard.readthedocs.io/en/dev/rdl/what/ along the lines of:

RDLS does not specify which fields to include within risk datasets. You ought to make sure that your risk datasets include the fields needed to fulfil their intended uses.

Edit: If there's a need to list the specific fields from #56, I think the right place would be a new "what to include in risk datasets" page under how to publish risk datasets.

duncandewhurst commented 1 year ago

@odscjen let me know when your updates are ready for review.

odscjen commented 1 year ago

@duncandewhurst please go ahead and review the google doc :)

stufraser1 commented 1 year ago

Possible workflow diagram for this guides page, showing users how the templates and validation tool work together. https://docs.google.com/presentation/d/1pKpDUlZ1QlhLx6PgiZDWCzda7O5N3zBI/edit#slide=id.g27aa981260d_0_0

duncandewhurst commented 1 year ago

I mentioned it briefly in https://github.com/GFDRR/rdl-standard/pull/147#pullrequestreview-1551271337, but to reiterate and expand on the reasoning - I strongly suggest that we do not encourage implementers to author JSON data by hand, even using a template.

Even for people who are very familiar with JSON, authoring data by hand is very time-consuming and error-prone. In a standards context, this means that implementers and the people supporting them waste lots of time trying to fix basic JSON errors (missing brackets, commas, incorrect nesting etc.) that have nothing to do with the standard itself.

JSON is an appropriate format for exporting data from an existing system or generating data programmatically, but it is not well-suited to authoring data by hand, especially for large and complex data such as RDLS metadata. For implementers who are authoring data by hand, the spreadsheet template is the best approach currently available so we should encourage that.

I've explained this in the guidance on how to prepare RDLS metadata.

We can certainly add a diagram showing the relationship between the spreadsheet template, JSON data and the RDLS Convertor, but I don't think we should promote an RDLS JSON template as an 'option'.