Develop a general scenario data format

Ludee commented 4 years ago

There is a data format from the MODEX project FlexMex which is used in open_MODEX and SzenarienDB.

Discuss and develop a common datapackage format for scenario data.

Existing ideas:

Ludee commented 4 years ago

First version contains:

scalar.csv timeseries.csv datapackage.json erm.graphml

Ludee commented 4 years ago

A first test with the time series data shows that the data size blows up due to repeating values. Implement an improved relational structure with general info and the time series.

Ludee commented 4 years ago

Discuss and consider: https://help.github.com/en/github/managing-files-in-a-repository/rendering-csv-and-tsv-data

christian-rli commented 4 years ago

(Edit update: singular of timeseries is timeseries)

Ludee commented 4 years ago

The updated ERM looks like this. Putting them together would be next logical step. But then an entry can have a value and a timeseries? Would that be OK?

oep-scenario-data_datapackage_erm

jh-RLI commented 4 years ago

IMO it looks good, but I think that we are not quite aiming for a correct relational data model with this. As far as I know, there are cases where a scalar has several scenarios/regions/years. This would violate the atomic values, or if several columns are added, redundancies would occur (normalized data). We could think about introducing more relations. Each relation should hold all functional entities and so on. One large data model might be less complex, but why not follow general best practice?

scalar problem case 1: non-atomic entities

id	scenario	region	year	...
1	A,B,C or ALL	DE,... or ALL	2030,2050 or ALL

case 2: redundant PK lines

id	scenario	region	year
1	A	DE	2030
1	B	FR	2050
1	C	..	..

As far as I understand we can solve this by adding a new relation and foreign key´s. An easy way to identify a practical data model is to model the relations (1:1 ; 1:n ; m:n) in the ERM as seen below but I'm not quite sure how to group the year and region since it's not clear to me if they belong to the scenario. Otherwise, we need even more relations for those two.

FYI: https://stackoverflow.com/a/7296873/10489845

What do you think about this? @Ludee

Otherwise, we could stick with the current approach and review this after receiving some practical feedback.

Ludee commented 4 years ago

That's a good thought. Keep in mind this is not only for the database. The solution must be valid for the OEP and a datapackage (CSV).

Ludee commented 4 years ago

We just decided to create a new OEP repo called oedatamodel (oedm) to develop and publish a data model template.

OpenEnergyPlatform / oedatamodel

Develop a general scenario data format #1