Swirrl / cubiql

CubiQL: A GraphQL service for querying multidimensional Linked Data Cubes
Eclipse Public License 1.0
41 stars 2 forks source link

Adjust get observations query to the CubiQL request #143

Closed zeginis closed 6 years ago

zeginis commented 6 years ago

The SPARQL query to get the observations is the same regardless of the CubiQL request.

For example:

{cubiql{
  dataset_births {
    observations {
   page(first:20){
    observation{
      count      
} } } }}}

And

{cubiql{
  dataset_births {
    observations {
   page(first:20000){
    observation{
      count 
      gender
      measure_type
      time_period
      uri
 } } } }}}

Use the same SPARQL query:

PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT * WHERE {  
?obs  a qb:Observation .  
?obs  qb:dataSet <http://statistics.gov.scot/data/births> .
?obs <http://purl.org/linked-data/cube#measureType> ?mp . 
?obs ?mp ?mv . 
?obs <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?dim1 .
 ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?dim2 . 
?obs <http://purl.org/linked-data/cube#measureType> ?dim3 . 
?obs <http://statistics.gov.scot/def/dimension/gender> ?dim4 . 
?obs <http://statistics.gov.scot/def/dimension/timePeriod> ?dim5 .}

Using a LIMIT or asking only for the dims requested could improve the performace

zeginis commented 6 years ago

The above has been partially solved. There is a LIMIT at the SPARQL queries based on the parameter e.g. first:20

However I identified that some SPARQL queries are executed even if they are not used at the result e.g.

The CubiQL query:

{cubiql{
  dataset_earnings{observations{
    page{
      observation{
        population_group       
        median
        gender
}}}}}}

Result in the execution of 3 SPARQL queries:

  1. Get cube metadata -> metadata are not requested by the CubiQL query
    PREFIX qb: <http://purl.org/linked-data/cube#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dcterms: <http://purl.org/dc/terms/>
    SELECT distinct * WHERE {  
    VALUES ?ds { <http://statistics.gov.scot/data/earnings> }  
    ?ds a qb:DataSet .
    {  ?ds <http://www.w3.org/2000/01/rdf-schema#label> ?title .}
    UNION {  ?ds rdfs:comment ?description .}
    UNION { ?ds dcterms:issued ?issued . }
    UNION { ?ds dcterms:publisher ?publisher . }
    UNION { ?ds dcterms:license ?licence . }
    UNION {  SELECT ?modified WHERE { 
    ?ds dcterms:modified ?modified .  } ORDER BY DESC(?modified) LIMIT 1}}
  2. Get total matches -> total matches are not requested by the CubiQL query
    PREFIX qb: <http://purl.org/linked-data/cube#>
    SELECT (COUNT(*) AS ?c) WHERE {  
    ?obs a qb:Observation .  
    ?obs qb:dataSet <http://statistics.gov.scot/data/earnings> .
    ?obs <http://purl.org/linked-data/cube#measureType> ?mp . 
    ?obs ?mp ?mv . 
    ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?dim1 .
    ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?dim2 .
    ?obs <http://purl.org/linked-data/cube#measureType> ?dim3 . 
    ?obs <http://statistics.gov.scot/def/dimension/gender> ?dim4 . 
    ?obs <http://statistics.gov.scot/def/dimension/populationGroup> ?dim5 .}
  3. Get observation -> this is the only required query
    PREFIX qb: <http://purl.org/linked-data/cube#>
    SELECT * WHERE {  
    ?obs  a qb:Observation .  
    ?obs  qb:dataSet <http://statistics.gov.scot/data/earnings> .
    ?obs <http://purl.org/linked-data/cube#measureType> ?mp .
    ?obs ?mp ?mv . 
    ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?dim1 . 
    ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?dim2 .
    ?obs <http://purl.org/linked-data/cube#measureType> ?dim3 . 
    ?obs <http://statistics.gov.scot/def/dimension/gender> ?dim4 . 
    ?obs <http://statistics.gov.scot/def/dimension/populationGroup> ?dim5 .} 
    LIMIT 10 OFFSET 0

    By executing only the SPARQL required will lead in performance improvement.