Open jpmccu opened 7 years ago
TBL agrees with me: https://www.w3.org/DesignIssues/Relative
Best Practice
Because an application developer is very likely to find it valuable to use relative URIs, any software libraries which serialize web file formats such as HTML and Turtle must provide the option (or the default) to serialize using relative URIs.
When data is stored in files, for example on a web server, it is good practice to store it using relative URIs.
When data is sent across the net, also, such as in HTTP, relative URIs should be used.
Yes, there are cases when people want to design systems without this properties, so absolute URIs should be an option.
hehe, it's funny that the N3 parser's original CWM code mentions him as an author ;) IIRC we defused its URI absolutizing tendencies a couple of times already and i fully agree, that relative URIs should be supported...
It looks like the culprit is in notation3.py:1866. The same code crops up in trig.py as well:
baseURI = graph.absolutize(
source.getPublicId() or source.getSystemId() or "")
p = SinkParser(sink, baseURI=baseURI, turtle=turtle)
I'm guessing that one of those always returns CWD if there's nothing else.
@nicholascar, is this something worth working on to get into 6.0? I'd like to sort it out if possible.
This is quite critical to me also, and it also raises the question, if we do resolve relative URIs, what should be the base? Currently rdflib takes the source file name as the base which seems to me like it does not necessarily make a lot of sense.
if we do resolve relative URIs, what should be the base
I think TBL's point, referenced above by @jpmccu, is that the relative URIs should be un-touched so that they remain relative. Therefore the base is... whatever they were originally set to be and it should not be set to CWD or other in baseURI
Here's the FileInputSource place that sets system_id to the file path: https://github.com/RDFLib/rdflib/blob/e30e386cdf33f839a7feed6458731a6cd6fd8ceb/rdflib/parser.py#L310. This is probably affecting most of the cases since most uses of relative URIs are probably in files that end up calling this Source.
The SAX XmlReader could set it too, if the parser was fed data (see https://github.com/RDFLib/rdflib/blob/e30e386cdf33f839a7feed6458731a6cd6fd8ceb/rdflib/parser.py#L100) but I suspect this hardly ever/never occurs?
So, we should supply no base
and/or not override base
with absolute file location.
Currently rdflib takes the source file name as the base which seems to me like it does not necessarily make a lot of sense.
Agreed, except is the source file path, not the name right? So I think we just need one change to sop this happening?
Can we have some example data here please? I don't ever/often use relative URIs so could @jpmccu or @aucampia please put in an example?
I generally work in Turtle, so here's an example in that:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix prov: <http://www.w3.org/ns/prov#>.
<Alice> a prov:Person.
<Alice> rdfs:label "Alice Smith".
<spreadsheet> a prov:Entity;
prov:wasGeneratedBy [ a prov:Activity, prov:wasAssociatedWith <Alice>];
prov:wasAttributedTo <Alice>.
That should exercise relative and absolute URIs, blank nodes, literals, subclauses and direct statements in Turtle.
Right now, relative URIs are resolved to their absolute form, in some cases, using CWD if there's no other information. In many situations, this is purely an annoyance. Relative URIs are still URIs. It's possible to create RDFlib graphs in place that use relative URIs, but reading in a serialized graph with the same results in a different graph (for many use cases). It should be possible to preserve relative URIs as an option on parse().