Swirrl / drafter

A clojure service and a client to it for exposing data management operations to PMD
Other
0 stars 0 forks source link

URIs are resolved against Drafter process working directory within draftset queries #668

Open lkitching opened 1 year ago

lkitching commented 1 year ago

The query rewriter used within draftsets parses incoming query strings with Jena Arq, and then replaces any live graph URIs with their corresponding draft graphs. The Jena query parser resolves all URIs against a base URI during the parsing step. If one is not explicitly specified, the URI for the process current working directory is used.

The URI of the working directory looks something like file:///opt/drafter. This contains a file scheme, an empty authority and a path of /opt/drafter.

The URI specification describes how URIs should be resolved against a base URI. Note that in section 5.2.2 if the 'relative' URI defines a scheme, the resulting URI should use the scheme, authority, path and query from the relative URI. Jena does not respect this behaviour and instead uses the scheme and (empty) authority from the base URI.

This means that a query such as

SELECT *
WHERE {
  <file:/example-files/out/4g-coverage.csv#obs/e06000047,2022-09%40geographic-area-with-4g-coverage-by-at-least-one-provider> ?p ?o .
}
LIMIT 10

is parsed as

SELECT *
WHERE {
  <file:///example-files/out/4g-coverage.csv#obs/e06000047,2022-09%40geographic-area-with-4g-coverage-by-at-least-one-provider> ?p ?o .
}
LIMIT 10

before being rewritten.

Ideally we would specify a base URI such as https://drafter.publishmydata.com to such queries instead of relying on the default Jena behaviour. We should also process live queries the same way instead of submitting them directly to stardog.

RickMoynihan commented 1 year ago

Is the JENA issue a bug then? If so should we file it up stream, regardless of whether it will fix the issue for us?

lkitching commented 1 year ago

It looks like a Jena bug, although it looks like the implementation does follow the logic in the URI spec so maybe there's something we can do to make the first condition return true.