CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
137 stars 32 forks source link

Invalid queries if varible name is substring of another variable name #361

Open jaw111 opened 3 years ago

jaw111 commented 3 years ago

Given a query like where a variable is a substring of another variable name

select *
where {
  [] rdfs:label ?__label ;
    skos:prefLabel ?__label2 .
}

If the request URL includes the label parameter e.g. ?label=foo then the resulting query is invalid. Whereby the string ?__label is replaced by "foo":

select *
where {
  [] rdfs:label "foo" ;
    skos:prefLabel "foo"2 .
}

The logic for rewriting the queries should be more robust than simply replacing strings in the query text to account for this.

c-martinez commented 3 years ago

Hi @jaw111! Interesting issue, I don't think we've ever come across this sort of use case before. We've been thinking for a while that the variable replacement code should be upgraded, to overcome issues such as #230, so maybe this is something to be taken into account as well.

The only thing that I can think of, is to do string replacement, starting from the longest variable name (?__label2 in your example above). So something along these lines should do the trick:

def doReplace(s, vals):
    # Start replacing longest variable names
    for key in sorted(vals.keys(), key=len, reverse=True):
        s = s.replace(key, vals[key])
    return s

This is not very sophisticated, but if you are aware of any more elegant algorithm to address this issue, we are open to suggestions :-)

jaw111 commented 3 years ago

@c-martinez I like your suggestion, it's nice and simple :)

My other thought was that, as the query is being translated into the SPARQL Algebra Expression with rdflib, it should be possible to programmatically manipulate that expression to replace the variable by the RDF term and reserialize back to text. That might be more complex than manipulating the query text as a string, but should be a more robust approach.

Another approach would be to construct a VALUES clause with the bindings for the relevant variables and simply append that to the query text as suggested in #332.