RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Support relative URIs #677

Open jpmccu opened 7 years ago

jpmccu commented 7 years ago

Right now, relative URIs are resolved to their absolute form, in some cases, using CWD if there's no other information. In many situations, this is purely an annoyance. Relative URIs are still URIs. It's possible to create RDFlib graphs in place that use relative URIs, but reading in a serialized graph with the same results in a different graph (for many use cases). It should be possible to preserve relative URIs as an option on parse().

jpmccu commented 7 years ago

TBL agrees with me: https://www.w3.org/DesignIssues/Relative

Best Practice

Because an application developer is very likely to find it valuable to use relative URIs, any software libraries which serialize web file formats such as HTML and Turtle must provide the option (or the default) to serialize using relative URIs.

When data is stored in files, for example on a web server, it is good practice to store it using relative URIs.

When data is sent across the net, also, such as in HTTP, relative URIs should be used.

Yes, there are cases when people want to design systems without this properties, so absolute URIs should be an option.

joernhees commented 7 years ago

hehe, it's funny that the N3 parser's original CWM code mentions him as an author ;) IIRC we defused its URI absolutizing tendencies a couple of times already and i fully agree, that relative URIs should be supported...

jpmccu commented 7 years ago

It looks like the culprit is in notation3.py:1866. The same code crops up in trig.py as well:

        baseURI = graph.absolutize(
            source.getPublicId() or source.getSystemId() or "")
        p = SinkParser(sink, baseURI=baseURI, turtle=turtle)

I'm guessing that one of those always returns CWD if there's nothing else.

jpmccu commented 3 years ago

@nicholascar, is this something worth working on to get into 6.0? I'd like to sort it out if possible.

aucampia commented 2 years ago

This is quite critical to me also, and it also raises the question, if we do resolve relative URIs, what should be the base? Currently rdflib takes the source file name as the base which seems to me like it does not necessarily make a lot of sense.

nicholascar commented 2 years ago

if we do resolve relative URIs, what should be the base

I think TBL's point, referenced above by @jpmccu, is that the relative URIs should be un-touched so that they remain relative. Therefore the base is... whatever they were originally set to be and it should not be set to CWD or other in baseURI

Here's the FileInputSource place that sets system_id to the file path: https://github.com/RDFLib/rdflib/blob/e30e386cdf33f839a7feed6458731a6cd6fd8ceb/rdflib/parser.py#L310. This is probably affecting most of the cases since most uses of relative URIs are probably in files that end up calling this Source.

The SAX XmlReader could set it too, if the parser was fed data (see https://github.com/RDFLib/rdflib/blob/e30e386cdf33f839a7feed6458731a6cd6fd8ceb/rdflib/parser.py#L100) but I suspect this hardly ever/never occurs?

So, we should supply no base and/or not override base with absolute file location.

Currently rdflib takes the source file name as the base which seems to me like it does not necessarily make a lot of sense.

Agreed, except is the source file path, not the name right? So I think we just need one change to sop this happening?

Can we have some example data here please? I don't ever/often use relative URIs so could @jpmccu or @aucampia please put in an example?

jpmccu commented 2 years ago

I generally work in Turtle, so here's an example in that:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix prov: <http://www.w3.org/ns/prov#>.

<Alice> a prov:Person.
<Alice> rdfs:label "Alice Smith".

<spreadsheet> a prov:Entity;
   prov:wasGeneratedBy [ a prov:Activity, prov:wasAssociatedWith <Alice>];
   prov:wasAttributedTo <Alice>.

That should exercise relative and absolute URIs, blank nodes, literals, subclauses and direct statements in Turtle.