ScotGovAnalysis / opendatascot

An R package to pull data from statistics.gov.scot into R
https://scotgovanalysis.github.io/opendatascot/
MIT License
47 stars 6 forks source link

More efficiently list unique concepts of a dataset #88

Closed RickMoynihan closed 3 years ago

RickMoynihan commented 5 years ago

If this query made use of information in the datasets DSD it would be able to more efficiently list all the codes used in a dataset for a given dimension, allowing it to work on much larger datasets

Obviously when you port to R you should replace the BINDs I've put in the SPARQL query with the string substitutions you're doing:

PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?code WHERE {
  BIND(<http://statistics.gov.scot/data/self-employment> AS ?ds) 
  BIND(<http://purl.org/linked-data/sdmx/2009/dimension#refArea> AS ?dim)

  ?ds qb:structure/qb:component ?compspec . 
  ?compspec qb:dimension ?dim .
  ?compspec qb:codeList ?codelist .
  ?codelist skos:member/rdfs:label ?code .
}
BillSwirrl commented 5 years ago

Hi @thomascrines and all - just to note that Liam Cavin asked the Swirrl team to take a look at this and see if we had any suggestions to make, hence this issue and one or two others that Rick has opened. Give us a shout if anything in the issues isn't clear. Cheers

GordonBryden commented 5 years ago

Thanks both, we appreciate the insight. I'll try this out.

GordonBryden commented 5 years ago

I've tested this approach out, and it seems to be slower than the current method when I tested it using the "earnings" dataset.

The example you provide fails in the sparql-beta interface. I've no idea why this would be, but it feels like it might be a red flag.

GordonBryden commented 3 years ago

On further review, this seems to work where my query times out, so I'll implementing it as the new standard sparql query for ods_concept

GordonBryden commented 3 years ago

Implemented and working as intended, thanks both