Closed Ludee closed 4 years ago
First version contains:
scalar.csv timeseries.csv datapackage.json erm.graphml
A first test with the time series data shows that the data size blows up due to repeating values. Implement an improved relational structure with general info and the time series.
(Edit update: singular of timeseries is timeseries)
The updated ERM looks like this. Putting them together would be next logical step. But then an entry can have a value and a timeseries? Would that be OK?
IMO it looks good, but I think that we are not quite aiming for a correct relational data model with this. As far as I know, there are cases where a scalar has several scenarios/regions/years. This would violate the atomic values, or if several columns are added, redundancies would occur (normalized data). We could think about introducing more relations. Each relation should hold all functional entities and so on. One large data model might be less complex, but why not follow general best practice?
scalar problem case 1: non-atomic entities
id | scenario | region | year | ... |
---|---|---|---|---|
1 | A,B,C or ALL | DE,... or ALL | 2030,2050 or ALL |
case 2: redundant PK lines
id | scenario | region | year | ... |
---|---|---|---|---|
1 | A | DE | 2030 | |
1 | B | FR | 2050 | |
1 | C | .. | .. |
As far as I understand we can solve this by adding a new relation and foreign key´s. An easy way to identify a practical data model is to model the relations (1:1 ; 1:n ; m:n) in the ERM as seen below but I'm not quite sure how to group the year and region since it's not clear to me if they belong to the scenario. Otherwise, we need even more relations for those two.
FYI: https://stackoverflow.com/a/7296873/10489845
What do you think about this? @Ludee
Otherwise, we could stick with the current approach and review this after receiving some practical feedback.
That's a good thought. Keep in mind this is not only for the database. The solution must be valid for the OEP and a datapackage (CSV).
We just decided to create a new OEP repo called oedatamodel (oedm) to develop and publish a data model template.
There is a data format from the MODEX project FlexMex which is used in open_MODEX and SzenarienDB.
Discuss and develop a common datapackage format for scenario data.
Existing ideas: