Open zeginis opened 7 years ago
I have created a google doc to document the search functionality we want to implement. It contains the GraphQL queries that should be supported.
https://docs.google.com/document/d/1Trw9NM_gUM_qA6aM7t_NaWUCfuQ1ZQeyH7Q8DvcIAy4/edit?usp=sharing
We could consider using Lucene search operations over literals, where supported (e.g. http://www.stardog.com/docs/#_search)
I have addes separete issues for each functionality see: #28, #29, #30, #31, #32
I think a Lucene (text) search against title, description, theme would be very useful - as Ric says, not every SPARQL endpoint will necessarily support it, and different databases might have different SPARQL extensions for this purpose, so it would not be very 'standard'.
Searching datasets by a text search on the labels of dimensions and/or dimension values is something that Robin previously identified as a useful thing. eg find me datasets that have information about 'Manchester' or 'working age' or 'ethnicity' - but often the user won't know the specific URI.
I agree with you that a free text search is also required. E.g find datasets about 'working age adults'. In this case a literal search will return datasets that contain 'working age' either at the title, comment, dimension label (e.g. http://statistics.gov.scot/data/qualifications-working-age-people)
This will work complementary with a more structured type of search where the user or the client program knows the specific URI. For example get datasets that have the value ''working age adults' at the dimension 'Population Group' (e.g. http://statistics.gov.scot/data/poverty).
This structured type of search is required in order to get datasets with 'similar' structure that can be processed together e.g. at a machine learning component.
A case where a combination of free text search and structured search is required is the following.
Search for datasets that contain the year 2013. This can be translated to a query: "Give me the datasets that have the value 2013 at the dimension refPeriod".
However 2013 can be represented in different ways at the dataset e.g.:
It is good to have a functionality that enables the searching of data sets: