UNECE / GSIMRevision

1 stars 1 forks source link

Dimensional Data #50

Open Ygor1970 opened 1 year ago

Ygor1970 commented 1 year ago

Due to the flexibility of the model, I found it quite difficult to understand how dimensional data should be stored.

I found the relationship between Instance Variable and Data Point required a lot of repetition for each Data Point. The addition of Universe in v1.2 has enabled the model to be simplified to allow more reuse.

Taking the example in https://statswiki.unece.org/download/attachments/260408186/GSIM%20e-training%20presentation%20-%20Structure%20Group_Scanu_Karling.pptx?version=1&modificationDate=1572518476054&api=v2

The Datum 28793 is related to the Data Point Italy, 2016, annual average household income, public transfers income, two components,

The Population of the Data Point is Italy, 2016. Although if the dataset contains data for other time periods or other countries, the Population of the corresponding Instance Variable could be EU, 2000-2023 to describe the coverage of the Instance Variable. The Instance Variable can then be reused for other Time Periods and Geographies in the Dataset.

The measure is the Represented Variable 'annual average household income' which can be reused for different Universes rather than creating new Represented Variables and Instance Variables.

The Universe is public transfers income, two components, As Universe is a subtype of Concept, this can be related to the 2 Nodes 'public Transfer income' and 'two components'.

To position the Geography and Time Period within a data structure, I created 2 new subtypes of Data Structure Component :- Geographic Component and Reference Time Period Component. These would not have a represented variable. These would have an axis, order and level to position it but the values would come from the Data Record. This is another reason to normalise Population into Geography and Reference Time Period

Drawing1 (14)

To position the Nodes within the data structure, I made the entity that resolved the many to many relationship between Universe an Node an actual entity 'Universe Node'. This would be related to a 'Dimension Variable' e.g. 'Households main income source' and could be referenced by a new Data Structure Component, Dimension Component

Drawing1 (13)

The Subtypes of Variable are Represented Variable, Attribute Variable (see https://github.com/UNECE/GSIMRevision/issues/49 ) and Dimension Variable

Ygor1970 commented 1 year ago

For the Data Point Italy, 2016, annual average household income, public transfers income, two components,

Geographic Component - axis=2, order=3, level=2 - value from Data Record (Italy) Reference Time Period Component - axis=1, order=2, level=2 - value from Data Record (2016) Measure Component - axis3=3, order=1, level=1, represented variable =annual average household income Dimension Component - axis =1, order=2, level=1, node value=two Dimension Component - axis =2, order=3, level=1, node value=public transfers income

Ygor1970 commented 1 year ago

I hadn't realised the relationship between Concept and Node had been removed in 2.0.

As Universe is a subtype of Concept, I believe what I've called a Universe Node is actually a Designation, No need for a Dimension Variable image

JALinnerud commented 1 year ago

We are currently looking at all feedback to the GSIM Revision. It might be more useful to have a dialogue (email, Teams) about all your feedback rather than comments on individual issues. Please let me know here if you agree. Jenny Linnerud, Statistics Norway.