Swirrl / ons-data-export

Temporary repo to keep track of the extraction of data between the PMD3 backed alpha for the COGS project, and the PMD4 staging server.
0 stars 0 forks source link

Stale data in PMD3 - flow dimension #22

Closed jennet closed 4 years ago

jennet commented 4 years ago

Some data issues are due to stale data in the PMD3 db, i.e. the data error has been fixed in the configuration, but this has not removed erroneous triples from the underlying database, and so they are in turn pulled through into PMD4.

jennet commented 4 years ago

Dimension http://gss-data.org.uk/def/dimension/flow includes a codelist that is an error and should not be in the data: http://gss-data.org.uk/def/concept-scheme/migration-directions

I cannot find any reference to it in components.csv

There also appears to be two different dimensions, Flow and Flow Directions, that use the same code list. Not sure if this is intentional.

Robsteranium commented 4 years ago

@ajtucker mentioned this morning that this was likely caused by jenkins not flushing the components graph when that pipeline was re-run.

ajtucker commented 4 years ago

We should just be using flow-direction in the trade datasets, see GSS-Cogs/family-trade#62

RedWalters commented 4 years ago

All trade datasets have been updated to use flow directions rather than flow

jennet commented 4 years ago

migration-directions code list is still coming through the extract as attached to the flow dimension

The extraction scripts retrieve the migration-direction code list and extend the flow dimension to include this as a code list on these datasets:

RedWalters commented 4 years ago
ajtucker commented 4 years ago

I think I've found and removed all the stragglers now. We're not using multiple codelists per dimension in the trade datasets currently.

PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT * WHERE {
  ?d a qb:DataSet ;
     qb:structure / qb:component / qb:dimension ?dim .
  ?dim qb:codeList ?cl1, ?cl2 .
  FILTER (?cl1 != ?cl2 && regex(str(?d), "trade")) .
}