RDFLib / prez

Prez is a data-configurable Linked Data API framework that delivers profiles of Knowledge Graph data according to the Content Negotiation by Profile standard.
BSD 3-Clause "New" or "Revised" License
22 stars 8 forks source link

Fix: Prez SPARQL endpoint's handling of url encoding/decoding of queries in GET requests #182

Closed edmondchuc closed 10 months ago

edmondchuc commented 10 months ago

For some reason, the previous implementation won't work with query parameters passed to the httpx.URL object as the result becomes a double URL encoded value. This happens when we send a SPARQL query using Postman but not via the Yasgui client.

Postman:

Testing with https://meyerweb.com/eric/tools/dencoder/, it requires the value to be decoded twice.

b'query=PREFIX%2520rdf:%2520%253Chttp:%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%2523%253E%250APREFIX%2520skos:%2520%253Chttp:%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%2523%253E%250ASELECT%2520*%2520WHERE%2520%7B?s%2520?p%2520?obj%2520.FILTER%2520(?p%2520IN%2520(skos:prefLabel)%2520)FILTER%2520regex(str(?s),%2520%2522id%2FLexicon%2FNamedRockUnit%2522)FILTER%2520(regex(lcase(str(?obj)),%2520%2522slate%2522))%7D'

Using FastAPI/Starlette to decode and then performing percent encoding using urllib.parse.quote_plus:

b'query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX+skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0ASELECT+%2A+WHERE+%7B%3Fs+%3Fp+%3Fobj+.FILTER+%28%3Fp+IN+%28skos%3AprefLabel%29+%29FILTER+regex%28str%28%3Fs%29%2C+%22id%2FLexicon%2FNamedRockUnit%22%29FILTER+%28regex%28lcase%28str%28%3Fobj%29%29%2C+%22slate%22%29%29%7D'

This PR fixes this issue by getting FastAPI/Starlette to always decode the incoming query parameter and encoding it ourselves using urllib.parse.quote_plus. By doing this, it doesn't look like httpx.URL will double encode the value.

I've also refactored the code slightly to keep all concerns with incoming HTTP in the route handler.

recalcitrantsupplant commented 10 months ago

@lalewis1 could you please test this locally, you can use postman (I use a similar service called Hoppscotch) Could you do a SPARQL query that includes something like:

{
  VALUES ?EscapedSearchTerm {"\"Illumina HiSeq 4000\"" "Illumina HiSeq 4000"
  }

Could you also add/create a test_endpoints_sparql.py file with queries in different query= formats