Swirrl / cubiql

CubiQL: A GraphQL service for querying multidimensional Linked Data Cubes
Eclipse Public License 1.0
41 stars 2 forks source link

Search datasets by dimension/attribute values #30

Open zeginis opened 6 years ago

zeginis commented 6 years ago

In this case we need to find datasets that have specific values at the dimensions or attributes.

Expected GraphQL queries:

{ datasets(data: {
   or /and : [ { dimension:"http://statistics.gov.scot/def/dimension/populationGroup”    
                 value:”http://statistics.gov.scot/def/concept/population-group/breastfed”}
               { dimension:"http://statistics.gov.scot/def/dimension/populationGroup”    
                  value:”http://statistics.gov.scot/def/concept/population-group/children”} ]}){ 
       title
}}
{ datasets(data: {
      greater/smaller: [ { dimension:"http://purl.org/linked-data/sdmx/2009/dimension#refPeriod”
                        value:”http://reference.data.gov.uk/id/year/2015”} ]}) {
       title
}}
{ datasets(data: {
  and / or: [ { attribute:"http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure”    
            value:”http://statistics.gov.scot/def/concept/measure-units/pounds-gbp”}
            { attribute:"http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure”    
            value:”http://statistics.gov.scot/def/concept/measure-units/million-pounds-gbp”}]}){
       title
}}

Required changes at schema:

:queries
 {:datasets
  {:type    (list :dataset)
   :resolve :resolve-datasets
   :args
            {:dimensions {:type :filter}
             :data      {:type :filter}   --> add
             :uri        {:type :uri}}}}    
{:filter
  {:fields
   {:or  {:type        (list :uri)
          :description "List of URIs for which at least one must be contained within matching datasets."}
    :and {:type        (list :uri)
          :description "List of URIs which must all be contained within matching datasets."}
    :greater {:type        (list :uri)                                                    -->add
          :description "List of URIs which matching datasets must have greater values."}  -->add
    :smaller {:type        (list :uri)                                                    -->add  
          :description "List of URIs which matching datasets must have smaller values."}   -->add
}}

We may also need to modify the and/or operators at the schema to take as input dimension, value pairs.

zeginis commented 6 years ago

At implementation level there are two options to search for the dimension/attribute values. I currently use option 2.

Option 1 Search at the dataset observations e.g.

select distinct ?ds where {  
  ?obs qb:dataSet ?ds.
  ?ds a qb:DataSet.
  ?obs <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod>                          
       <http://reference.data.gov.uk/id/year/2012>.        
 }

Pros: is generic and work with every data set even if no code lists are defined Cons: is slow, may lead to time out

Option 2 Search at the dataset structure. This option can be used at PublishMyData. e.g.

select distinct ?ds where {
  ?ds a qb:DataSet.
  ?ds qb:structure ?dsd.
  ?dsd qb:component ?comp.
  ?comp qb:dimension <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod>.
  ?comp qb:codeList ?cl.
  ?cl skos:member  <http://reference.data.gov.uk/id/year/2012>.   
}

Pros: is fast since it searches only at the structure. There is no need to iterate over all observations Cons: requires a code list that contains ONLY the values used at the dataset. Thus, separate code lists should be defined for different datasets.

The same options apply at "Search hierarchical data" #31