OpenEnergyPlatform / oemetadata

Repository for the Open Energy Family metadata. Contains metadata templates, examples and schemas. For metadata conversion see https://github.com/OpenEnergyPlatform/omi
https://openenergyplatform.github.io/oemetadata/
MIT License
21 stars 3 forks source link

Add property for documentation of data transformation to sources #134

Open areleu opened 1 year ago

areleu commented 1 year ago

Description of the issue

We have currently no reasonable way of documenting transformation steps applied to the referred sources. So far what I have been doing is add my transformation scripts and software as a further source. I think what I am doing is not entirely useful as there is no way of associating the added scripts to its respective source

Some data sources are in formats that are well documented and standarized. For example tabular data and RDF graphs. These can be transformed used querying languages as SQL and SPARQL. The documentation of these transformations can be done using the languages themselves! And in case of non structured data like excel files, the documentation can be done by adding urls to the repositories transforming them, this repository can be a python script for example.

Ideas of solution

I propose adding a new property to the sources items, namely transformations or something similar that refers to the operations done to convert the original resource. The transformations should be an list of items with properties: path, name, title, description, query/code and resource where each item should have either a path or a query/code property. When more than one items are provided is understood that the output of the first item is given to the second and the last item produces a resource in the current metadata, the latter should be referred using the name.

I do not know how risky is to add SQL into an instance of the OEMetadata, that can be discussed, if its really a problem we can take the necessary precausions

Workflow checklist

areleu commented 1 year ago

The reason I closed #84 was that, althought referencing software and its versions is nice, not having concrete steps on what was done with the software leaves a party trying to interpret the dataset without extra information on how the original data was modified.