Possibilty of schema update for the review process

OpenEnergyPlatform / oemetadata

Repository for the Open Energy Family metadata. Contains metadata templates, examples and schemas. For metadata conversion see https://github.com/OpenEnergyPlatform/omi

https://openenergyplatform.github.io/oemetadata/

MIT License

21 stars 3 forks source link

Possibilty of schema update for the review process #112

Closed steull closed 1 year ago

steull commented 1 year ago

Update schema.js with updated description and add further fields.

[x] Description
[x] Example
[x] Badge
[x] ✕ Review field stronglyRecommended (rejected)

steull commented 1 year ago

@christian-rli The schema is now updated. For the stronglyRecommended field, a final decision should be made whether this is only the Iron Badges or also other fields.

christian-rli commented 1 year ago

Thank you @steull . In my opinion the field should take a boolean value. Also it should be independent of the badges. The iron badge signifies a minimum amount of information for an upload.

In contrast, the idea behind the information in stronglyRecommended was that a reviewer gets the option to refuse a user input. If my overview is correct, this should only be the case if an incompatible or no license is provided and if a technical necessity would break things in other ways. I suppose that the naming stronglyRecommended can be misinterpreted here, as it suggests something similar to the iron badge. So I propose to rename it "redactable", "sensitive", "platformCritical".

So I suggest adding the boolean value "true" for:

license (due to the OEP open license clause)
name (for table name due to postgres table naming restrictions)
name (for column name due to postgres column naming restrictions)
type (for column type due to postgres column type restrictions)

Here"s where I'm unsure:

primaryKey (breaks table if wrong, however this should be set correctly in the upload anyway)
foreignKeys (see above)

@jh-RLI @Ludee @steull could you please give feedback to my assessment for

The character of what is now called "stronglyRecommended"
Renaming it and what to rename it to
The fields that would get a "true"

chrwm commented 1 year ago

I'm not familiar with the details of the original idea of the field stronglyRecommended.

the idea behind the information in stronglyRecommended was that a reviewer gets the option to refuse a user input.

So for each OEMetadata key the reviewer can signal via "stronglyRecommended" if the field was filled appropriately or not? If so, I'd name it review, type: str.

The importance of each OEMetadata key is described by the batch level information for me.

The iron badge signifies a minimum amount of information for an upload

minimun -> mandatory If one aspect is to signal mandatory fields for the upload, having the information "badge": "Iron" could be enough? The idea of iron is to indicate the technical minimun standard, isn't it? If yes, one sentence in the docs & various info texts should be fine.

christian-rli commented 1 year ago

Thank you for your thoughts @chrwm ! As you quote, the original idea for the strongly recommended field was to give the reviewer a reject option for certain inputs. Not all fields are created equal. Some are a technical necessity for upload, like column descriptions, some have legal significance, like the license necessary to publish data on the OEP.

Our motivation was to generate the review process page entirely from the schema. Since the iron badge only signifies the technically necessary fields for an upload, we wanted a way to include a way to label other no-go-input.

After more discussions, especially with @jh-RLI however, we concluded that this concern really only applies to the license information. It's possible to implement that exception in the review process without the schema, which is less elegant, but not a huge drama either. This would keep the schema clean and won't clutter it for a platform-specific use case. So I decided against the field stronglyRecommended after all. I removed all occurences with my latest commit.

christian-rli commented 1 year ago

During my work at the peer review I parsed through the schema with an my metadata string in order to call context information to all supplied fields. In doing so, I noticed that the schema structure did not conform to the example string and to the way the string is supplied by the platform.

Specifically, behind the key "subject" there is a list of objects. The schema however merely defines an object (without the list). Since example and platform worked in harmony, I think the schema is

wrong
not in use for anything where it's structurally critical is at play, because otherwise somebody would've noticed this already

I therefore took the liberty to adapt the schema and to make it conform to the way it's handled in platform and example.

@jh-RLI and @chrwm can you please confirm that this will in fact fix something and not in turn break something? Thank you.

christian-rli commented 1 year ago

I encountered another issue. The use of the keys description, example, type, badge and title is not consistent.

(a) Every simple object has all five of them.

(b) In objects that contain a simple list (e.g. languages and keywords), type, badge and title go one level deeper into an items object. type and badge disappear from the parent object, while title is duplicated. The example given in the object contains a list with several entries. This isn't wrong, strictly speaking. However, we don't have examples with lists that contain other objects, which would be a logical consequence. The example entry should therefore go into the lowest level and be a single entry.

(c) The same movement of keys holds true for objects that contain a list of other objects (e.g. licenses). However, the lower object does have a new description and since there are new keys in the lower object, there are new descriptions, badges and titles for them as well. So at the lowest level there are all five keys again

For consistency and cleaner code, I need/want access to all five keys on one level. This is possible in (a) and (c). (b) ought to be repaired. Fundamentally, I see two ways to achieve this:

Add a description and example to the lowest level (in items) of every simple list (optionally remove example list in the parent object)
Bring badge one level up (deal with type separately later, as it's not strictly important now)

I strongly advocate for 1, since it's cleaner and more consistent to always have all descriptive keys available on the lowest level. I will implement it in this way for the review process and then update here accordingly unless there are any objections in the mean time @jh-RLI @chrwm @Ludee

jh-RLI commented 1 year ago

Are you sure that you are using the latest versin of the schmea.json? The changes in this commit ca1c1fd add changes that are already in the latest schema.json in the develop branch.

christian-rli commented 1 year ago

Thank you for the quick response! I didn't think to check. It seems not. If I am not mistaken, In creating this branch @steull merged two ontology update branches and did not start from the develop branch. It's hard to trace back where precisely the schema diverged or came from. Maybe "subject" is the only difference, but it's hard to tell, as other changes add a lot of clutter that make a direct comparison difficult. Unfortunately there's now this very large commit in this branch's history with all added badges and a lot of updated examples and descriptions. I could:

start with a new branch from the current state of develop and redo all the changes
merge develop into this branch and presumably deal with a (big) merge conflict, or worse no conflict but the uneasy feeling that something doesn't quite add up

1 would have the advantage of a cleaner history, but it'd be a lot of manual labor to redo all of @steull 's work. It would also hide his contributions in the history (in a dead branch in git, but it would be referenced in this issue). 2 would feel a bit messy, both in the history as well as in the result (will the schema.py test if i did anything wrong in my merge?) Also it's a bit of a gamble how complex the conflict will be.

Two suboptimal solutions, but I'm wiling to go through any. I have a slight tendency to go with 1. What do you suggest @jh-RLI ?

PS: I was glad to discover at least that my changes were in line with how things should be

jh-RLI commented 1 year ago

At some point, I didn't look closely at what he was doing. Maybe I missed that he was working on an outdated branch. I also think it would be best to close the current PR and create a new branch and try to incorporate the changes there. I mean, if you are sure that you can fix the merge conflicts, then that is also a good solution. However, you will lose some data that you will have to reinsert afterwards anyway. Also, I don't think the commits are small enough, and then you'd have to manually set yourself which lines in the conflict get fixed and which can stay. So seems to me like that would be a lot of extra work for little added value. We can mention @steull as a contributor in the changelog or PR.

PS: Yes good catch!

PSS: I already commented on this in another oeplatform issue. Wanted to point out to you here again that the oemetadata now has a python package. You can update that very easily by posting a release here on github. Then an automated pipeline is triggered that uploads the current content of the master branch to PyPi. Then you can import the oemeta data as a python module and don't have to manually copy json files and keep them up to date.

christian-rli commented 1 year ago

... We can mention @steull as a contributor in the changelog or PR.

I implemented all changes and will mention him in the PR

PSS: I already commented on this in another oeplatform issue. Wanted to point out to you here again that the oemetadata now has a python package. You can update that very easily by posting a release here on github. Then an automated pipeline is triggered that uploads the current content of the master branch to PyPi. Then you can import the oemeta data as a python module and don't have to manually copy json files and keep them up to date.

Thanks for the info, It does seem like a cleaner way forward. However, (how) is it possible to import a certain version? Otherwise it would add another dependency whenever we make a code update. Let's discuss this in our meeting. If possible, I will implement an import once there's a working new release.

jh-RLI commented 1 year ago

You can import any version or just use the latest like this (same for template, example ....):

from metadata.latest.schema import OEMETADATA_LATEST_SCHEMA