OpenEnergyPlatform / data-preprocessing

Repository for data formatting, import of data, data and metadata review, and data curation.
GNU Affero General Public License v3.0
10 stars 7 forks source link

Setup and test review process for data #2

Closed Ludee closed 4 years ago

Ludee commented 6 years ago

Develop a GitHub based review process of data and metadata. Should have a maximal degree of transparency. Find and document the components and criteria used for evaluation. Decide for a "reward" (e.g. badge) when all test pass.

Ludee commented 6 years ago

Collection of criteria:

general:

metadata:

data:

Ludee commented 6 years ago

Use a branch with Pull Request to accept review next time.

nesnoj commented 6 years ago

Thanks for starting this topic!

Please consider the DatabaseRules from openmod wiki, especially the naming conventions. To my mind, It's a good idea to move the criteria from this issue to the wiki page and provide a link here (even better: create a new page on the OEP).

Could you provide your current idea of a appropriate workflow here? E.g.

  1. Write table to sandbox
  2. Make sure it follows the rules described in the DatabaseRules-Wiki
  3. Create an issue with tags ... and assign it to ...
  4. ...

It'd be nice to have a running workflow on data review to make life easier for upcoming projects (e.g. demandRegio). I'll try to support this..

Ludee commented 6 years ago

Combined the material from the openmod wiki and this issue on the wiki page. Feel free to update and improve the workflow.

nesnoj commented 6 years ago

Great @Ludee! Could you please add me to the OEP organization in order to allow amendments from my side..cheers

nesnoj commented 6 years ago

I made some amendments to the wiki..

klarareder commented 6 years ago

Is there the need for an expert/human to review the data? In that case we could think about points like in stack overflow. The person who performs a review gets credit by points. And the badge system could indicate by an additional star (or anything else) that an expert has verified the data.

klarareder commented 6 years ago

I just looked how others have dealt with this topic here an example: https://help.author.envato.com/hc/en-us/articles/360000471943-A-step-by-step-guide-to-the-upload-process

What we could include are topics like: Tags Upload limits Notes to the Reviewer


Maybe a question to understand this topic better: 1) Do we write a guide how to best use the database and what the review will show if done wrong (for the user)? 2) Do we just write the review process (for ourselves)?

klarareder commented 6 years ago

A comment on: Naming Conventions -> Table Name -> with resolution [tupel] (e.g. per_mun)

What is meant by: resolution = temporal or spatial?

per_mun = ? (I have no idea what that could be, municipal??) Maybe we could have somewhere a standard with common abbrevations. So if I don't know what mun means I can look it up.

klarareder commented 6 years ago

Collection of Criteria -> Data -> We could add somthing like: Depending on how it is done: 1.) Check if no errors occured at the upload 2.) Make sure all columns only contain the specified data type, e.g. double is required => '12.20' and not '12.20 kW'

christian-rli commented 5 years ago

"Setup and test a review process for data" - User Feedback and part of the requirement specification.

Make sure to resolve https://github.com/OpenEnergyPlatform/data-preprocessing/issues/8 as well before closing this issue. Also take note of https://github.com/OpenEnergyPlatform/data-preprocessing/issues/11.

RequirementSpecificationID=54

christian-rli commented 4 years ago

closed with #11 in #68