GFDRR / rdls-spreadsheet-template

A template for entering Risk Data Library Standard (RDLS) metadata in spreadsheet format
3 stars 0 forks source link

Conversion to JSON and validation #4

Closed duncandewhurst closed 10 months ago

duncandewhurst commented 11 months ago

@odscjen you can use the following Flatten Tool command to convert the latest version of the spreadsheet template to JSON format:

flatten-tool unflatten -f xlsx --metatab-name Meta --id-name id --root-id id -s https://rdl-standard.readthedocs.io/en/dev/rdls_schema.json --convert-wkt data.xlsx > data.json

Note that this command uses the schema from https://github.com/GFDRR/rdl-standard/pull/181 per the note in https://github.com/GFDRR/rdls-spreadsheet-template/issues/3.

Until we have a version of CoVE up and running you can use https://www.jsonschemavalidator.net/ to validate the JSON data against the schema. Note that you'll need to omit the outer object and main array from Flatten Tool's output in order to validate it.

There is a bug in Flatten Tool which affects arrays of strings and numbers so you can ignore the the 'Invalid type. Expected X but got Array' errors for now. You can also ignore the error relating to the regex pattern for links.rel. I think that's a false positive due to that validator only supporting JSON Schema draft 2019-09 so it should be resolved in CoVE, which uses draft 2020-12.

duncandewhurst commented 11 months ago

@odscjen I forgot to mention, when using https://www.jsonschemavalidator.net/, you'll need to paste in the schema from https://github.com/GFDRR/rdl-standard/pull/181 rather than the dev branch

duncandewhurst commented 11 months ago

There is a bug in Flatten Tool which affects arrays of strings and numbers so you can ignore the the 'Invalid type. Expected X but got Array' errors for now.

This isn't a bug. I had erroneously used a comma to separate the array items, when I should've used a semi-colon as documented here: https://flatten-tool.readthedocs.io/en/latest/unflatten/#plain-lists-unsupported

duncandewhurst commented 11 months ago

To aid testing, I've added a fully-populated XLSX template to the repository and the equivalent JSON file: fixtures/complete.xlsx and fixtures/complete.json.

I generated the JSON file using the following command. Note that the command uses jq to remove the outer object and main array from Flatten Tool's output:

flatten-tool unflatten -f xlsx --metatab-name Meta --id-name id --root-id id -s https://rdl-standard.readthedocs.io/en/dev/rdls_schema.json --convert-wkt fixtures/complete.xlsx | jq .main[0] > fixtures/complete.json

I used checkjsonschema, which supports JSON Schema 2020-12, to check that the data is valid:

check-jsonschema --schemafile https://rdl-standard.readthedocs.io/en/dev/rdls_schema.json fixtures/complete.json
duncandewhurst commented 11 months ago

The spreadsheet-fixes branch on the rdl-standard repo is now merged into dev so I've updated the commands in this issue to use the schema from the dev branch.

duncandewhurst commented 10 months ago

Noting that the Flatten Tool command might need updating depending on the outcome of https://github.com/GFDRR/rdls-spreadsheet-template/issues/9

duncandewhurst commented 10 months ago

Noting that the Flatten Tool command might need updating depending on the outcome of #9

We decided on using WKT so I've updated the commands in this issue.

Now that we have a dev instance of CoVE up and running, we should be using that for conversion and validation so I'm going to close this issue.