gbv / validation-server

Web service to validate data with support of multiple schema languages
https://format.gbv.de/validate/
MIT License
2 stars 1 forks source link

Automatic update of formats and schemas #8

Open nichtich opened 2 years ago

nichtich commented 2 years ago

New and modified formats and schemas from format.gbv.de should be included regulalry, maybe with a cronjob.

nichtich commented 2 years ago

Extract formats with JSON Schema schemas from format.gbv.de:

jq 'select(.schemas)|select(.schemas[]|select(.type=="json-schema"))|{id,title,short,homepage,schemas,base,for}|del(..|nulls)'
nichtich commented 2 years ago

Malformed schema files can easily break the application so

stefandesu commented 2 years ago

Validation of JSON schema files is included in this, isn't it? Can't we just use that to validate the updated files? I would suggest that in case of a malformed schema file, the old one will be kept, but we need to be notified of the malformed schema so we can fix it.

nichtich commented 2 years ago

Three levels of updates:

  1. Update of configuration. The core part of configuration is only modified manually, so configuration errors should break the application (#14)

  2. Update of the list of formats, referenced as file in the configuration. The is part of the configuration but we want to automatically update it. Possible solution:

    • get new formats file
    • start a new instance with new formats file (or a small script that runs await require("./lib/formats")(config))
    • override formats file on success, create notification on failure
  3. Update of schema files, referenced by URL in the list of formats

With configuration "update": "missing" schema files are never updated (unless the local formatsDirectory, acting as cache, is purged) so it's safe. Nevertheless external schema files may get updated so we need to update the schema files from time to time.

In summary, this can be solved with a helper script to check configuration.

nichtich commented 2 years ago

Script ./bin/update.js can be used e.g. when formats file is config/formats.json and formats directory is config/formats (not tested yet):

curl ... > newformats.json
rm -rf tmp; mkdir tmp
npm run update -- -f tmp/formats.json -d tmp/formats
cp tmp/formats.json config/formats.json
rm -rf config/formats
mv config/formats config

Should be done with a daily cronjob. Success could be checked via web interface when we provide modification timestamp of formats file.