Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes
Eclipse Public License 1.0
13 stars 4 forks source link

Use measure column(s) not a value column #101

Open Robsteranium opened 5 years ago

Robsteranium commented 5 years ago

Having measures as columns (#23) gives us the possibility of getting rid of the magic Value column (i.e. any column defined with no component type).

The examples below describe the distinction between how we might declare a cube using the multi-measure and measures-dimension approaches outlined in the RDF Data Cube specification. The use of a Value column only pertains to the measures-dimension approach. The choice, then, is whether to keep the Value or instead declare this in measures columns instead. Examples of the various observation declarations follow.

On the one hand, removing the Value column would be cleaner for machine-processing, as then the observation-csv is tidy - one row per observation, one column per component. This also removes the potential for confusion that has arisen around how the measure property is determined (i.e. currently the cell values in Measure Type needs to correspond to column titles of configurations for measure columns that themselves are never used as columns).

On the other hand, owing to the de-normalisation of the measure type dimension in cubes, without a Value column, you end up with a measure-columns <> Measure Type dependency and thus integrity problems (dependent updates between cells) and redundancy (all but 1 measure column will be empty in any given row). Having a Value column is definitely neater for human-readers, and potentially for machine-writers (no need to synchronise dependent updates).

If we do want to remove the Value column then we'd also need to think about backward compatibility, although we could use ons-table2qb to act as a compatibility layer (it would convert a Value column into measure columns before passing the results to table2qb).


one measure

multi-measure approach

Date,Count
2011,    1 
2012,    4

measure-dimension approach (with measure columns)

Date,Measure Type,Count
2011, Count      ,    1 
2012, Count      ,    4

measure-dimension approach (with value columns)

Date,Measure Type,Value
2011, Count      ,    1 
2012, Count      ,    4

many measures

multi-measure approach

Date,Count,GBP Total
2011,    1,1000000000
2012,    4,1000000010

measure-dimension approach (with measure columns)

Date,Measure Type,Count,GBP Total
2011, Count      ,    1, 
2011, GBP Total  ,     ,1000000000
2012, Count      ,    4, 
2012, GBP Total  ,     ,1000000010

measure-dimension approach (with value columns)

Date,Measure Type,Value
2011, Count      ,         1 
2011, GBP Total  ,1000000000
2012, Count      ,         4
2012, GBP Total  ,1000000010