OpenDataServices / flatten-tool

Tools for generating CSV and other flat versions of the structured data
http://flatten-tool.readthedocs.io/en/latest/
MIT License
102 stars 15 forks source link

flatten-tool and JSON 2020-12 #447

Open kathryn-ods opened 3 months ago

kathryn-ods commented 3 months ago

Previously we've used flatten tool to create BODS templates.

Now that BODS has been upgraded to JSON 2020-12 I think this doesn't work anymore.

Following the instructions for https://flatten-tool.readthedocs.io/en/latest/usage-bods/#create-template

I am now getting the error

Error while resolvingurn:components#/$defs/UnspecifiedRecord: URLError: <urlopen error unknown url type: urn>

kd-ods commented 2 months ago

@rhiaro will you have a chance to look at this over the next week or two?

radix0000 commented 2 days ago

Currently flatten-tool does not support handling urn: links between schema files, where "the schema" is a directory rather than a single file. For updating BODS Cove to 0.4 I have got around this, with a branch (https://github.com/OpenDataServices/flatten-tool/tree/handle_bods_0.4) that allows BODS Cove to call flatten tool with the 0.4 schema. This allows the (Python) caller to pass a dict containing the schema (with minimal, and generic, changes to flatten-tool), and the loading and resolving of links can be done in cove_bods.

The issue is that there are actually multiple schema changes here, the actual change to DRAFT202012 of jsonschema, but also the use of urn: references to link the various schema files. Currently I am using the same approach as in the data standard tests themselves, of using the Register construct provided by jsonschema, within lib-cove-bods. This works well and has allowed the adding BODS 0.4, without the cascading changes propagating out into repositories we don't control. However this does mean there is no single schema file, which the command line tool expects.

There is a question about whether in the long term we want to support passing a schema to flatten-tool as a directory. This would require some changes to the CLI, as it would be necessary to pass not only the directory name, but also the root file (e.g. statement.json in the case of BODS 0.4), and adding code for building a Register construct out of the files in a robust manner (not just for BODS 0.4).

Updating compile-to-json-schema to build a single schema file where there are urn: links, would be an alternative way to go. I have currently avoided this for a number of reasons. It might need to involve either forking jsonref to add urn: support, or re-writing compile-to-json-schema to not rely on jsonref. Neither of these seems particularly palatable alternatives.