OpenEnergyPlatform / omi

Repository for the Open Metadata Integration (OMI). For metadata definition see metadata repo:
https://github.com/OpenEnergyPlatform/metadata
GNU Affero General Public License v3.0
7 stars 4 forks source link

Include the metadata validation scripts from the oedatamodel #62

Closed Ludee closed 1 year ago

Ludee commented 2 years ago

@areleu developed functions to validate metadata. This should be included in OMI: https://github.com/OpenEnergyPlatform/oedatamodel/pull/55

@jh-RLI please get in contact to coordinate the merge and implementation.

jh-RLI commented 2 years ago

Yes, this is great @areleu . There were already efforts to implement this in omi. But there was a long discussion about how to use the oemetadata schema and that's why the feature never got finished. Short summary:

I think we can use the github links like [1], but personally I think it would be even better if we provide a separate rest API endpoint via the OEP website where one can load the schema for all versions. But since we have too many construction sites, the github solution will do for me.

[1] Tagged: https://raw.githubusercontent.com/OpenEnergyPlatform/metadata/v1.0.1/metadata/v140/schema.json

The implementation that were once started in omi is also a little different, using jsonschema and not jsonschema-rs. If we want to maximize performance, it seems to make sense to use jsonschema validation in omi since we use smaller json files? But the performance difference is pretty small. jsonschema-rs outperformes jsonschema when using lager json files.

henhuy commented 2 years ago

I would appreciate seeing this validation in OMI as I want to use it in oem2orm tool. Note: Please integrate latest metadata version v151 as well! (as I need latest version) A quick release afterwards would also be nice as I prefer to import OMI from pypi instead github repo. Looking at current implementation (see https://github.com/OpenEnergyPlatform/omi/blob/dev/src/omi/dialects/oep/parser.py#L1021), errors are only printed - for me it would make more sense to raise errors explicitly! Thank you both!

jh-RLI commented 2 years ago

I agree, this is on my list for some time now but i had no time to implement the proper exceptions. Also there is this issue that asks for a report of errors instead of one error at the time. I will implement it so we can get both.
The exception you mentioned is part of a very deprecated assertion functionality. We will replace it with the jsonschme or jsonschema_rs.validate() based on the oemetadata schema.json for each oemetadata version. This should be fairly quick and I can do the release afterwards. My goal is to have this implemented and released by the end of next week.

areleu commented 2 years ago

I don't think this should be that complicated. The metadata should get an IRI that redirects to either the github pages or some other place of the OEP where the metadata lives. This redirection already works with the Ontology so I guess the infrastructurre is already there.

nesnoj commented 2 years ago

OT sidenote: I just used OMI (0.0.7) for validating some metadata for a publication (v1.4.1) and results are somewhat inconsistent. Some missing keys are not detected etc.. dunno if things work out better in v1.5.x . I wish I could say: "As those data are OMI-approved they are OEP meta compliant" but that's simply not true (at least for v1.4.1).

Sorry for making (somewhat OT) waves but it took me too much time to manually fiddle around with the metadata..

jh-RLI commented 2 years ago

That is the goal! The validation is broken/non-existing ATM and the keys are not verified. I think alot of these issues will resolve as soon as we have the json schema validation implemented.

I also want OMI to provide that level of trust :D I'll work on it over the next few days and try to work on it some more next week. It took me some time to adapt to the current codebase. I think if we were to recreate omi, we wouldn't write all the code that currently exists, and we would provide some documentation :O

Since v1.4.1 is outdated (but of course still supported), there will be a function to convert metadata (v1.4 to v1.5). It is not released yet and I just noticed a small bug, but it is planned for the next version (omi v.0.8.0). At least you can easily update the metadata keys to oemetadata v1.5 and possibly use the new validation with the next version of omi.

henhuy commented 2 years ago

Personally, I vote for jsonschema instead of jsonschema_rs, as it seems to be the one the community uses (more stars) and additionally it is the one used by django-jsonforms which is used in the meta_tool (thus switching to jsonschema_rs in backend would lead to two required packages jsonschema AND jsonschema_rs).

jh-RLI commented 2 years ago

Good point! wasnt aware of the relation to django-jsonforms !. @areleu provided an implementation using jsonschema_rs to validate the oedatamodel´s datapackage json files against oemetadata json schema. Do you think it is worth to migrate that implementation to jsonschema? @henhuy

henhuy commented 2 years ago

Maybe not needed, as oedatamodel package will not be integrated into OEP backend, right? It's more a collection of datapackages around scenario data.

areleu commented 2 years ago

Good point! wasnt aware of the relation to django-jsonforms !. @areleu provided an implementation using jsonschema_rs to validate the oedatamodel´s datapackage json files against oemetadata json schema. Do you think it is worth to migrate that implementation to jsonschema? @henhuy

I think translating from jsonschema_rs to jsonschema is trivial. The former has more friendly interfaces (and has better performance in some aspects) but functionality-wise they should be interchangeable.

henhuy commented 2 years ago

Any progress here? Once this is implemented, I want to integrate OMI validation into oem2orm

jh-RLI commented 2 years ago

Wasn't able to work on this as I was sick the last two days and didn't have much time last week. Will work on it today.

henhuy commented 2 years ago

Wasn't able to work on this as I was sick the last two days and didn't have much time last week. Will work on it today.

No problem! And thank you :)

jh-RLI commented 2 years ago

The validation works to some extent now, but needs further testing. I still need to add some documentation and to add this to the omi cli.

For now you can have a look at the branch: feature/omi-oem-validation-jsonschema#20

install omi from local and install oemetadata: pip install -U ./ pip install -i https://test.pypi.org/simple/ oemetadata==1.5.1a2

and use the validation like this:

from omi.dialects.oep.parser import JSONParser
from metadata.latest.schema import OEMETADATA_LATEST_SCHEMA #other schemas can be used
def validate_oemetadata():
    parser = JSONParser()
    _input_file = "path/metadata_v15.json"

    with open(_input_file, "rb") as inp:
        file = json.load(inp)

    parser.validate(file, [OEMETADATA_LATEST_SCHEMA]) #this will create a nice report, change the schema for other versions, will create nothing if no error
    # return parser.is_valid(file) # this will create a report with missleading errors because old schemas are checked

I wanted to make it possible to test a metadata file with all available schemas (from oemetadata), but this does not work properly so far (a lot of errors are added to the report). Also, the official pip version of oemetadata was misconfigured and did not include json files. I fixed this, but it is currently only published on test.pypi oemetadata. We have coupled the oemetadata version and the pip release, so I don't think I can fix the release until the next official oemetadata release.