RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.18k stars 558 forks source link

Given a Graph, can it be used as default graph for a Dataset? #319

Open uholzer opened 11 years ago

uholzer commented 11 years ago

Imagine, you are given a Graph, maybe created with Graph() or maybe backed by some arbitrary store. Maybe, for some reason you need a Dataset and you want your Graph to be the default graph of this Dataset. The Dataset should not and will not need to contain any other graphs.

Is there a straightforward way to achieve this without copying all triples? As far as I know, Dataset needs a context_aware and graph_aware store, so it is not possible to just create a Dataset backed by the same store. Graham Klyne is interested in this because he wants to provide a SPARQL endpoint for a given rdflib.Graph, but my implementation of a SPARQL endpoint requires a Dataset. I don't really like to implement support for plain graphs, so I wonder whether there is a simple solution.

Also, I wonder whether it would be useful to have a true union of several graphs backed by different stores.

gromgull commented 11 years ago

There is no way to do it currently, but it would be easy enough to add.

In most cases, the underlying store WILL be context_aware, since most of our stores are, but even if it isn't, we could implement a special "single graph dataset" that will throw an exception if you try to get any other graphs? And actually, the DataSet is very similar to a graph, how would your endpoint implementation break if just handed a graph.

For the actual SPARQL calls, I made an effort to work with both ConjunctiveGraph and DataSet (or rather, with graph_aware and not graph_aware stores) for the bits that require graphs, and even with a non context-aware graph/store for everything else.

The true-union of graphs from different stores is easy to do naively and with poor performance, and probably impossible to make really scalable (if you have 1000 graphs ... ) It's probably another issue though :)

uholzer commented 11 years ago

Thinking about it again, some fixes to my implementation should indeed make it compatible with rdflib.Graph.

uholzer commented 11 years ago

There is a discrepancy between Graph and Dataset (note that the parsed triple is missing from the serialization):

>>> ds = rdflib.Dataset()
>>> ds.parse(data='<a> <b> <c>.', format='turtle')
>>> print(ds.serialize(format='turtle'))
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

>>> for c in ds.contexts(): print c
... 
[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory']].
<urn:x-rdflib:default> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].

It is clear to me why this happens. ConjunctiveGraph parses into a fresh graph and Dataset inherits this behaviour. For ConjunctiveGraphs one does not observe the above, because the default is the union and hence contains the fresh graph.

Is this behaviour intended? (It doesn't bother me much, I just wanted to note it.)

iherman commented 11 years ago

Well...

one would have to look at the turtle parser behaviour to understand what is going on. But it also a rdflib design decision. Formally, a turtle file returns a graph. Not a dataset; a graph. Which means that the situation below is unclear at a certain level: what happens if one parses a turtle file (ie a graph) into a dataset. I guess the obvious answer is that it should be parsed into the default graph, but either the turtle parser is modified to do that explicitly in case or a Dataset, or an extra trick should be done in the Dataset object. And, of course, any modification to the turtle file should be done to all other parsers, which is a bit of a pain (though may be a much cleaner solution!).

B.t.w., Gunnar has the pen for the dataset stuff, I have not touched it for a long time (I was on vacations) and I am not sure I can look at it in the periods to come. Maybe I can, but I am not sure whether Gunnar has made any changes while I was away...

Ivan

On Aug 10, 2013, at 15:03 , Urs Holzer notifications@github.com wrote:

There is a discrepancy between Graph and Dataset (note that the parsed triple is missing from the serialization):

ds = rdflib.Dataset() ds.parse(data=' .', format='turtle') print(ds.serialize(format='turtle')) @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix xml: http://www.w3.org/XML/1998/namespace . @prefix xsd: http://www.w3.org/2001/XMLSchema# .

for c in ds.contexts(): print c ... [a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory']]. urn:x-rdflib:default a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].

It is clear to me why this happens. ConjunctiveGraph parses into a fresh graph and Dataset inherits this behaviour. For ConjunctiveGraphs one does not observe the above, because the default is the union and hence contains the fresh graph.

Is this behaviour intended? (It doesn't bother me much, I just wanted to note it.)

— Reply to this email directly or view it on GitHub.

uholzer commented 11 years ago

@iherman I hope you had great vacations. Don't wory, there is no haste and it is okay for me if the current behaviour is kept.