OpenEnergyPlatform / oemetadata

Repository for the Open Energy Family metadata. Contains metadata templates, examples and schemas. For metadata conversion see https://github.com/OpenEnergyPlatform/omi
https://openenergyplatform.github.io/oemetadata/
MIT License
21 stars 3 forks source link

Remove "additionalProperties" constraint from the schema #132

Open areleu opened 1 year ago

areleu commented 1 year ago

Description of the issue

Or make them "true"...

Context

Currently the metadata is technically compatible with the frictionless framework, this means that if a package consisting of data + metadata can be simply read into frictionless without errors:

import frictionless as fl

PROFILE = "https://raw.githubusercontent.com/OpenEnergyPlatform/oemetadata/develop/metadata/v160/schema.json"

package = fl.Package(source="datapackage.json", profile=PROFILE)

# This works!
for r in package.resources[0].read_rows():
    print(r)

But things get bumpy when trying to validate the package, you will get errors if you call:

package.validate()

Example output:

{'valid': False,
 'stats': {'tasks': 0, 'errors': 3, 'warnings': 0, 'seconds': 0.049},
 'warnings': [],
 'errors': [{'type': 'package-error',
             'title': 'Package Error',
             'description': 'A validation cannot be processed.',
             'message': 'The data package has an error: Additional properties '
                        "are not allowed ('csv' was unexpected) at property "
                        "'resources/0/dialect'",
             'tags': [],
             'note': "Additional properties are not allowed ('csv' was "
                     "unexpected) at property 'resources/0/dialect'"},
            {'type': 'package-error',
             'title': 'Package Error',
             'description': 'A validation cannot be processed.',
             'message': 'The data package has an error: Additional properties '
                        "are not allowed ('mediatype', 'scheme', 'type' were "
                        "unexpected) at property 'resources/0'",
             'tags': [],
             'note': "Additional properties are not allowed ('mediatype', "
                     "'scheme', 'type' were unexpected) at property "
                     "'resources/0'"},
            {'type': 'package-error',
             'title': 'Package Error',
             'description': 'A validation cannot be processed.',
             'message': 'The data package has an error: Additional properties '
                        "are not allowed ('profile' was unexpected)",
             'tags': [],
             'note': "Additional properties are not allowed ('profile' was "
                     'unexpected)'}],
 'tasks': []}

As you can see, frictionless adds new properties ['mediatype', 'scheme', 'type'] to each resource, uses a different format in the dialect section and adds a profile field to the package itself.

Ideas of solution

I would offer two possible solutions:

a. Instruct anyone using frictionless that theys should watch out for these extra properties.

b. Make "additionalProperties" true (or remove it completely, it defaults to true anyways) so the validation is ligther using jsonschema tools.

Deciding on the second depends on how much your infrastructure is depending on this constraint, I see the value on validating inputs, but can be a pain in other contexts.

Workflow checklist

jh-RLI commented 1 year ago

It's nice to see this working with Frictionless, as I haven't tested this in a long time.

I will bring this up at the next developer meeting this week. As for our infrastructure, the schema is mainly used in the Open Energy Platform and we need to update the integration anyway.

I think the constraint is implemented to ensure that a validated metadata string matches the expected fields from the Oemetadata specification.

jh-RLI commented 1 year ago

In the meeting we decided that we need to do some tests to find the best solution for our system (as you gussed). In general, it seems feasible for us to remove the constraint from the schema.

We plan to restructure Oemetdata (also the schema) to be able to create metadata for a collection of tables, rather than creating single metadata for multiple tables. It is likely that we will update the schema while we revise the oemetdata.

Unity then I will add some information in the readme soon.