GSS-Cogs / family-trade

1 stars 2 forks source link

ONS-Quarterly-country-and-regional-GDP #85

Closed ajtucker closed 3 years ago

ajtucker commented 4 years ago

https://github.com/GSS-Cogs/family-trade/tree/master/datasets/ONS-Quarterly-country-and-regional-GDP

CharlesRendle commented 4 years ago

Was unsure which values to assign as units as each tab contains multiple measures and unit is not immediately obvious.

Each dataframe still contains multiple measures which I expect to break up in stage 2.

tab_title still contains superscripts.

rossbowen commented 3 years ago

Failing Jenkins tests atm - looks like a lot of the data is duplicated in the .csv output, so csvlint is throwing exceptions.

https://ci.floop.org.uk/job/GSS_data/job/Trade/job/ONS-Quarterly-country-and-regional-GDP/1/console

robons commented 3 years ago

I've made some updates to better specify the measures and units as well as setting some codelists. We need a few more DE changes which need to happen to get it inline with what we're doing elsewhere.

Note that this is a multi-measure, multi-unit dataset with a (now) blended codelist referencing existing Concept URIs.

@CharlesRendle or @Shannon95 I don't know whether one of you wants to pick this up?

rossbowen commented 3 years ago

@Tracey-B Ready for review: https://staging.gss-data.org.uk/cube/explore?uri=http%3A%2F%2Fgss-data.org.uk%2Fdata%2Fgss_data%2Ftrade%2Fons-quarterly-country-and-regional-gdp-catalog-entry&filters-drawer=open&show-uris=true

robons commented 3 years ago

@rossbowen - There's a bit of an unexpected behaviour here. It's partitioning observations by the measure. No observation has all four values - this is a problem with how we show the unit IMO. It's unpleasant.

rossbowen commented 3 years ago

@robons you're right - though I believe this is the PMD4 platform making a decision on how to display the data rather than any particular issued with the underlying RDF - if we think this is a problem in the UX space (or that it should be squashed) we can raise an issue.

robons commented 3 years ago

I think there is a UX issue here, but we're also coining unique URIs depending on the observation's measure type - which we have to do to meet the qb specification. So I'm not sure how PMD would know which observations should be grouped together. I guess they could group observations by dimension (excluding the one which defines the measure)?

But perhaps we need to start defining slices of data grouping these observations together and telling PMD that this is how they should be displayed? I guess this will affect all multi-measure datasets.

Any thoughts @ajtucker?

rossbowen commented 3 years ago

Something like this?

PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?obs ?measure_type_label ?value
WHERE { 
  ?obs qb:dataSet <http://gss-data.org.uk/data/gss_data/trade/ons-quarterly-country-and-regional-gdp#dataset> .
  ?obs qb:measureType ?measure_type .
  ?measure_type rdfs:label ?measure_type_label .
  ?obs ?measure ?value .
  ?measure a qb:MeasureProperty .
}
robons commented 3 years ago

Something like this?

PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?obs ?measure_type_label ?value
WHERE { 
  ?obs qb:dataSet <http://gss-data.org.uk/data/gss_data/trade/ons-quarterly-country-and-regional-gdp#dataset> .
  ?obs qb:measureType ?measure_type .
  ?measure_type rdfs:label ?measure_type_label .
  ?obs ?measure ?value .
  ?measure a qb:MeasureProperty .
}

I'm not too sure what you're getting at here. I was thinking something like this:

<https://some-uri/this-dataset/obs-1-height> 
    a qb:Observation;
    <https://some-uri/this-dataset#dimension/category> <https://some-uri/this-dataset#concept/category/some-concept>;
    <https://some-uri/this-dataset#measure/height> 56.7;
    qb:measureType <https://some-uri/this-dataset#measure/height>;
    sdmxAttribute:unitMeasure <https://some-uri/this-dataset#units/meters>.

<https://some-uri/this-dataset/obs-1-width> 
    a qb:Observation;
    <https://some-uri/this-dataset#dimension/category> <https://some-uri/this-dataset#concept/category/some-concept>;
    <https://some-uri/this-dataset#measure/width> 20.1;
    qb:measureType <https://some-uri/this-dataset#measure/width>;
    sdmxAttribute:unitMeasure <https://some-uri/this-dataset#units/meters>.

<https://some-uri/this-dataset#slice/obs-1> 
    a qb:slice;
    qb:sliceStructure <https://some-uri/this-dataset#slice-type/slice-across-measures>;
    <https://some-uri/this-dataset#dimension/category> <https://some-uri/this-dataset#concept/category/some-concept>;
    qb:observation <https://some-uri/this-dataset/obs-1-height>, <https://some-uri/this-dataset/obs-1-width>.

Then the slice would map to a row in the table. We'd have to find some other way of displaying the unit though.

Tracey-B commented 3 years ago

@rossbowen BA check complete with the following comments:

Robsteranium commented 3 years ago

I wonder if we could find ONS Geography codes instead of coining new URIs for the geographies. Indeed I'd suggest we ask ONS Geography to coin codes for any that are missing.

Robsteranium commented 3 years ago

In case you've not seen it, the spec offers two ways of handling multiple measures, multi-measure observations vs the measure (type) dimension.

We've ended up settling on the measures dimension approach because it makes it possible to mix attribute properties. With multi-measures, all the measures are attached to the same observation so the attribute properties need to be shared. It's not so much a problem with how we show the unit (or the observation-status marker) as with how we're able to model it.

I agree the tidy view does have some UX drawbacks (but I think the simplicity is worth it). We explored an alternate tidyish UI that rolled-up measures as part of the alpha. This leads to a lot of other problems. There's a thread about display options on cogs-issues to discuss this.

There are further problems with this cube however, so it's not just a UX problem in this case...

I'm not sure how PMD would know which observations should be grouped together

In theory all the ?cube qb:structure/qb:dimension ?dimension minus qb:measureType should identify an observation with each of it's measures. This is what distinguishes dimensions from attributes - the latter should serve to locate observations uniquely within the cube.

That condition appears to be violated in this cube. This query suggests that there are multiple observations distinguished only be attribute%0AWHERE%20%7B%0A%20%20%3Fobs%20qb%3AdataSet%20%3Chttp%3A%2F%2Fgss-data.org.uk%2Fdata%2Fgss_data%2Ftrade%2Fons-regional-gross-domestic-product-city-regions%23dataset%3E%3B%0A%20%20%20%20qb%3AmeasureType%20%3Fmeasure%3B%0A%20%20%20%20sdmxa%3AunitMeasure%20%3Funit%3B%0A%20%20%20%20.%0A%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Fobs%20sdmxa%3AobsStatus%20%3Fstatus%3B%0A%20%20%7D%0A%7D%0AGROUP%20BY%20%3Fmeasure%20%3Funit%20%3Fstatus%0AORDER%20BY%20%3Fmeasure%20%3Funit%20%3Fstatus). Were it not to time-out, I would expect IC-12 to fail.

We could resolve this by making a unit dimension although that would lead to sparseness (worsening the UX).

Alternatively we could extend the cvm measure to have sub-measures (rate, index, value) - potentially keeping just one of the orders of magnitude for value.

Another option would be to make rate and index attributes themselves (this seems reasonable given the CVM methodology).

We could also partition the cube into subsets with coherent measures (shifting this to a discoverability/ cataloguing problem).

A further option would be to have the measure property point at resource instead of a literal. This resource could itself contain arbitrary values and attributes. This would mean a clean data model, a foundation for cell annotation for a cleaner UI, but wouldn't solve the problem for tabular downloads (we'd need to denormalise it again, and now the csv wouldn't match the UI).

ajtucker commented 3 years ago

Some recent fixes to CSV2RDF mean that we need to update our CSV-W JSON files that use URI templates like {uri} to use {+uri} instead.

JasonHowell commented 3 years ago

Closing issue as the ones open will be dealt with separately.

This has been published to the Beta env.