SANSA-Stack / Archived-SANSA-Query

SANSA Query Layer
Apache License 2.0
31 stars 13 forks source link

Add possibility to add Variable Mapping to the result set of Sparqlify approach #13

Closed GezimSejdiu closed 4 years ago

GezimSejdiu commented 6 years ago

As the SANSA query API provides as well possibilities to write queries directly without exposing one endpoint :

val triples = spark.rdf(input)(path)
val query = "SELECT * WHERE {?s ?p ?o} LIMIT 10"
val result = triples.sparql(triples) 

Here we get as a result in a data frame of bindings and would be great if we provide a wrapper which map variables to the result set.

Best

Aklakan commented 6 years ago

So the clean way to get the result as a table would be for sparql query execution yield essentially a (RDB2RDF) Mapping - i.e.:

The API could look like this:

val datasetMapping = triples.sparql(query)

val naturalDataset: Dataset[Row] = datasetMapping.dataset
val mapping: Map[Var, Expr] datasetMapping.mapping
val bindingDataset: Dataset[Binding] = datasetMapping.asBindings
Aklakan commented 5 years ago

Maybe we can solve this issue with a simple variable substitution: The final mapping is comprised of (a) the sql query (or its algebra expression) and the (multi) mapping of each sparql variable to a set of defining expressions:

?s = {uri(?foo), plainLiteral(?bar), ...}
?o = {typedLiteral(?baz, my:datatype) }

So by analyzing this mapping, we can decide which columns are needed, and apply a substitution on the sql expression to tidy up variable names. Ontop seems to use nice variable names from the group up; maybe the substitution is not needed there.

Aklakan commented 4 years ago

Please continue discussion at #47