RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.18k stars 559 forks source link

Trig format cannot be processed by RDFLib #2958

Open floresbakker opened 3 weeks ago

floresbakker commented 3 weeks ago

Data in Trig format cannot be processed by RDFLib.

Let us assume the following data including graphs (example copied from RDFlib documentation)

GraphString = '''

PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

eg:graph-1 {
    eg:drewp a foaf:Person .
    eg:drewp eg:says "Hello World" .
}

eg:graph-2 {
    eg:nick a foaf:Person .
    eg:nick eg:says "Hi World" .
}
'''

Next, let's parse this data into a Graph object:

someGraph = Graph()
someGraph.parse(data=GraphString , format="trig")

Let us query the graph:

someQuery = someGraph.query('''

select ?s

where  {
         ?s ?p ?o     
       }   
''')   

If we then go through the result set, there is unexpectedly nothing:

for row in someQuery:
        print (str(row.s))

This does not lead to any result, whereas I would expect the following bindings for the variable ?s .

http://example.com/person/drewp
http://example.com/person/nick
http://example.com/person/drewp
http://example.com/person/nick

If I prepare the data differently by removing the explicit graphs, I do get the expected results:

GraphString = '''

PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    eg:drewp a foaf:Person .
    eg:drewp eg:says "Hello World" .

    eg:nick a foaf:Person .
    eg:nick eg:says "Hi World" .
'''

Result of the query:

http://example.com/person/drewp
http://example.com/person/nick
http://example.com/person/drewp
http://example.com/person/nick

Perhaps I am mistaken in this and I should work in a different way with rdflib graph objects in Python that contain trig data, but it does seem incorrect behavior from purely a triples & sparql point of view.

ajnelson-nist commented 3 weeks ago

I've been curious about this behavior, too. I think it's consistent with the SPARQL specification to behave differently when a quads graph is used vs. a triples graph.

The SPARQL 1.1 grammar has the term QuadsNotTriples (item 51), which indicates a syntax difference in the WHERE clause.

Your query ...

someQuery = someGraph.query('''

select ?s

where  {
         ?s ?p ?o     
       }   
''') 

... would need to become:

 someQuery = someGraph.query('''

 select ?s

 where  {
+         GRAPH ?g {
          ?s ?p ?o     
+         }
        }   
 ''')

I only think this because of trying to figure out some nuances with the JSON-LD @graph keyword. My queries written for triples graphs started not returning results when I gave the @graph JSON key a sibling @id key. I got results again when throwing in that GRAPH ?g { ... } wrapper.

I'm not sure offhand where in the SPARQL specification this gets spelled out, though. The word "graph" appears a few hundred times in the document. So I'm curious for how this thread goes.

floresbakker commented 3 weeks ago

I've been curious about this behavior, too. I think it's consistent with the SPARQL specification to behave differently when a quads graph is used vs. a triples graph.

The SPARQL 1.1 grammar has the term QuadsNotTriples (item 51), which indicates a syntax difference in the WHERE clause.

Your query ...

someQuery = someGraph.query('''

select ?s

where  {
         ?s ?p ?o     
       }   
''') 

... would need to become:

 someQuery = someGraph.query('''

 select ?s

 where  {
+         GRAPH ?g {
          ?s ?p ?o     
+         }
        }   
 ''')

I only think this because of trying to figure out some nuances with the JSON-LD @graph keyword. My queries written for triples graphs started not returning results when I gave the @graph JSON key a sibling @id key. I got results again when throwing in that GRAPH ?g { ... } wrapper.

I'm not sure offhand where in the SPARQL specification this gets spelled out, though. The word "graph" appears a few hundred times in the document. So I'm curious for how this thread goes.

You are referring to rules dealing explicitly with UPDATE or DELETE statements in SPARQL. Those production rules make part of the abstract syntax tree of SPARQL, so one should read it as leaves and branches of a tree, not as nodes that stand on their own (51 > 50 > 48/49 > 38/39/40). See also note #8 in paragraph 19.8.

My issue deals with a SELECT statement. Would be destructive to the SPARQL specification if a graph could not be queried anymore without a graph statement. Fortunately that is not the case.

I have tested this issue with four engines, RDFLib, Speedy, Virtuoso and Jena. Only RDFlib breaks, the rest of the engines give me the expected bindings.

WhiteGobo commented 3 weeks ago

Doesnt work for Dataset either. I would have expected Graph() conceals data from its store, from graphs with a different identifier. This behaviour i wouldnt expect from Dataset. So i would expect the query on the dataset should work:

anotherGraphSameData = Dataset(store=someGraph.store)
someQuery = anotherGraphSameData.query('''

select ?s

where  {
         ?s ?p ?o     
       }   
''')   
for row in someQuery:
        print (str(row.s))
#still will return nothing

As sidenote the data is correctly parsed, only the query for dataset doesnt work. So this returns GraphString.

print(anotherGraphSameData.serialize(format="trig"))
WhiteGobo commented 3 weeks ago

Also one can search with given query in the data from the graph itself. So using:

someGraph = Graph(identifier=URIRef("http://example.com/person/graph-1"))

then you will get

http://example.com/person/drewp
http://example.com/person/drewp

But there doesnt seem to be any options for the sparql processor to ignore graph identifiers. See https://github.com/RDFLib/rdflib/blob/b0d7a7dc272bd6c87bbf807d017932b37c1257f7/rdflib/plugins/sparql/processor.py#L117-L124

floresbakker commented 3 weeks ago

I first noticed the behavior when I wanted to run PyShacl on trig data in RDFlib. I could never get that to work, despite that PyShacl is able to handle trig files. I suspected the issue might be in RDFlib, so I decided to create the above mentioned example. My workaround is to transform a trig file into a turtle file and then offer this instead to RDFlib/PyShacl. But this is not handy, as each time I want to process some data, I first have to transform the source.

WhiteGobo commented 3 weeks ago

The sparql query works if you use Dataset with the option default_untion. @ashleysommer gave some more background info at #2959

from rdflib import *

GraphString = '''
PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

eg:graph-1 {
    eg:drewp a foaf:Person .
    eg:drewp eg:says "Hello World" .
}

eg:graph-2 {
    eg:nick a foaf:Person .
    eg:nick eg:says "Hi World" .
}

eg:ash a foaf:Person .
eg:ash eg:says "Default" .
'''

ds = Dataset(default_union=True)
ds.parse(data=GraphString, format="trig")

someQuery = ds.query('''

select ?s

where  {
         ?s ?p ?o
       }
''')

for row in someQuery:
        print (str(row.s))