Swirrl / ons-data-export

Temporary repo to keep track of the extraction of data between the PMD3 backed alpha for the COGS project, and the PMD4 staging server.
0 stars 0 forks source link

Multiple measure values in large datasets - validating IC-17 IC #55

Closed jennet closed 4 years ago

jennet commented 4 years ago
  1. Go to alcohol duty dataset: http://cogs-staging-web.swirrlstaging.com/dataset/cube?uri=http%3A%2F%2Fgss-data.org.uk%2Fdata%2Fgss_data%2Fhealth%2Fhmrc_alcohol_bulletin-catalog-entry
  2. Change alcohol duty filter to “beer”
  3. Sort by “Beer clearances” (measure type)

The table disappears as that measure type does not appear in the slice of data selected by the filters.

After debugging this in PMD4 we think that the data in this case must be invalid in IC-17:

In a qb:DataSet which uses a Measure dimension then if there is a Observation for some combination of non-measure dimensions then there must be other Observations with the same non-measure dimension values for each of the declared measures.

i.e. if there's a dataset with multiple measure types, and there's an observation of some set of non-measure dimensions - there must be corresponding observation values for all of the other measures defined for that dataset.

If the data is valid, then filtering by any dimension will have at least one observation value for each of the measures in the dataset and so could not result in an empty observations table.

jennet commented 4 years ago

The IC-17 query often times out on large datasets, or datasets with many (non measure) dimensions due to the nature of having to check every combination of dimensions that has a measure, and then check that that set of dimension values also has a value for every other measure type.

BillSwirrl commented 4 years ago

Closing this because there is an equivalent issue in the cogs-issues repo: see https://github.com/Swirrl/cogs-issues/issues/32