SPARQL-Anything / sparql.anything

SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL.
https://sparql-anything.cc/
Apache License 2.0
217 stars 11 forks source link

csv option proposal: fx:csv.triple-patterns #347

Open justin2004 opened 1 year ago

justin2004 commented 1 year ago

Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.

In order to allow an easy transition (for users) from Tarql to SPARQL Anything, what if we add an option for csv files that would do the following...

justin@parens$ cat proposal.csv 
name,age,dog
bob,32,fido
jane,,sammy

In order to capture the values we currently need to express a triple pattern for each column like:

SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
                  fx:csv.null-string  "" .
       optional { ?row xyz:name ?name .}
       optional { ?row xyz:age ?age . }
       optional { ?row xyz:dog ?dog . }
      }
  }

which yields:

row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido

The proposal is to allow this query:

SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
                  fx:csv.null-string  "" ;
                  fx:csv.triple-patterns  "true" .
      }
  } 

to produce this:

row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido

So that means fx:csv.triple-patterns "true" causes these triple patterns to get inserted implicitly behinds the scenes:

       optional { ?row xyz:name ?name .}
       optional { ?row xyz:age ?age . }
       optional { ?row xyz:dog ?dog . }
rjyounes commented 1 year ago

Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.

justin2004 commented 1 year ago

@rjyounes what if, in a single csv, one column is "state_city" and another is "state city" ? how does TARQL handle the collision?

rjyounes commented 1 year ago

Good question. I haven't ever encountered it. Possibly some hand-correction is required.

enridaga commented 1 year ago

Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.

Indeed, currently, we are just making those strings URL-safe, which results in some unintuitive %20 appearing. Maybe we can think about adding an option to treat them as web page slugs, but even with that, there can be cases where the result is not intuitive anyway (cases, special chars, etc...).

@rjyounes what if, in a single csv, one column is "state_city" and another is "state city" ? how does TARQL handle the collision?

We already have this problem, sometimes CSVs repeat column names multiple times. We just add _1 etc... not great but intuitive enough.

enridaga commented 1 year ago

Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.

OK, now on the main point. I like the idea of providing a default triple pattern. It's interesting how you would get the same behaviour with the following:

{ fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
       [] xyz:name ?name ;
          xyz:age ?age ;
          xyz:dog ?dog . 
      }

without headers, we would need to add a convention for the variable name ?col_1 etc...

{ fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
       [] rdf:_1 ?col_1 ;
          rdf:_2 ?col_2 ;
          rdf:_3 ?col_3 . 
      }
justin2004 commented 1 year ago

@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.

enridaga commented 1 year ago

@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.

Even if we remove the null-string option?

justin2004 commented 1 year ago

Even if we remove the null-string option?

oh, if we don't assert the null-string option then that might be the Tarql behavior.

but i do know that my team likes using the null-string option with the SPARQL Anything OPTIONAL triple patterns (as they transition from Tarql to SPARQL Anything).