RDFLib / sparqlwrapper

A wrapper for a remote SPARQL endpoint
https://sparqlwrapper.readthedocs.io/
Other
520 stars 123 forks source link

DESCRIBE queries to Fuseki fail if output format set to turtle #132

Closed nicholascar closed 2 years ago

nicholascar commented 5 years ago

Reported by @alex-ip:

Fuseki won't accept 'turtle' as an output format and fails - returns 500 - if it is set. Is this a known bug?

A quick hack fix to use text/turtle instead of turtle is presented below.

For file https://github.com/RDFLib/sparqlwrapper/blob/master/SPARQLWrapper/Wrapper.py, function _query()

    def _query(self):
        """Internal method to execute the query. Returns the output of the
        C{urllib2.urlopen} method of the standard Python library

        @return: tuples with the raw request plus the expected format.
        @raise QueryBadFormed: If the C{HTTP return code} is C{400}.
        @raise Unauthorized: If the C{HTTP return code} is C{401}.
        @raise EndPointNotFound: If the C{HTTP return code} is C{404}.
        @raise URITooLong: If the C{HTTP return code} is C{414}.
        @raise EndPointInternalError: If the C{HTTP return code} is C{500}.
        """
        request = self._createRequest()

        try:
            #===================================================================
            # if self.timeout:
            #     response = urlopener(request, timeout=self.timeout)
            # else:
            #     response = urlopener(request)
            # return response, self.returnFormat
            #===================================================================
            try:
                if self.timeout:
                    response = urlopener(request, timeout=self.timeout)
                else:
                    response = urlopener(request)
                return response, self.returnFormat
            except urllib.error.HTTPError as e:
                # Hack to overcome issue where Fuseki won't accept 'turtle' as an output format
                if self.queryType == DESCRIBE:

                    if self.method == GET: 
                        request.full_url = re.sub('output=([^\\&]+)', 'output=application/\\1', request.full_url)
                        request.full_url = re.sub('application/n3', 'application/n-triples', request.full_url)
                    if self.method == POST:
                        request.data = re.sub('output=([^\\&]+)', 'output=application/\\1', request.data.decode('utf-8')).encode('utf-8')
                        request.data = re.sub('application/n3', 'application/n-triples', request.data.decode('utf-8')).encode('utf-8')

                    if self.timeout:
                        response = urlopener(request, timeout=self.timeout)
                    else:
                        response = urlopener(request)
                    return response, self.returnFormat
                else:
                    raise e
        except urllib.error.HTTPError as e:
            if e.code == 400:
                raise QueryBadFormed(e.read())
            elif e.code == 404:
                raise EndPointNotFound(e.read())
            elif e.code == 401:
                raise Unauthorized(e.read())
            elif e.code == 414:
                raise URITooLong(e.read())
            elif e.code == 500:
                raise EndPointInternalError(e.read())
            else:
                raise e
dayures commented 5 years ago

Hi @nicholascar quick question. Could you share the endpoint where this issue is raised? It this is not possible, could you tell us the version of fuseki? Thanks!

alex-ip commented 5 years ago

Apologies for the delay in responding to this.

The Apache Jena Fuseki instance we are using is Version 3.10.0.

If you email me at Alex(dot)Ip(at)ga(dot)gov(dot)au, I can send you through a working endpoint and some credentials to access it.

Cheers,

Alex

dayures commented 5 years ago

Thanks @alex-ip !

From my notes (available at the top of the code of Wrapper.py), "turtle" is not a valid alias for the output for Apache Jena Fuseki2.

In this case, maybe you can try setOnlyConneg() method.

I have tested it with an open endpoint of Apache Jena Fuseki (version 3.6.0) and it looks like it is working. Please, find the example below.

from SPARQLWrapper import SPARQLWrapper, TURTLE
from rdflib import Graph

sparql = SPARQLWrapper("http://agrovoc.uniroma2.it:3030/agrovoc/sparql") # Fuseki 3.6.0 (Fuseki2)

sparql.setQuery("""
    DESCRIBE <http://aims.fao.org/aos/agrovoc/c_aca7ac6d>
""")

sparql.setReturnFormat(TURTLE)
sparql.setOnlyConneg(True)
results = sparql.query().convert()
g = Graph()
g.parse(data=results, format="turtle")
print(g.serialize(format='turtle'))
alex-ip commented 5 years ago

Thanks, @dayures!

It looks like the setOnlyConneg fix only "kind of" worked with Apache Jena Fuseki version 3.10.0. It works for turtle, json-ld and rdf+xml, but requesting "n3" for a describe query returns a turtle response.

At least it doesn't error out.

Cheers,

Alex

nicholascar commented 5 years ago

@alex-ip n3 pretty much is Turtle. What we’re you expecting to get from using n3 other than Turtle?

If this is ok, can the Issue close?

alex-ip commented 5 years ago

G'day @nicholascar - I reckon that returning turtle when n3 is specified is probably not ideal behaviour. We are trying to offer RDF content from Jena-Fuseki triple-store in multiple formats, so I would prefer to obtain the result of a "describe" query in the required format rather than having to convert it client-side. Cheers, Alex

dayures commented 5 years ago

@alex-ip Maybe it is an issue in Fuseki.

Using a testing app for REST request (Advanced REST Cliente), I queried a Fuseki 3.6.0 endpoint, using GET verb, a DESCRIBE query and the Accept header value text/rdf+n3;application/n3;text/n3

http://agrovoc.uniroma2.it:3030/agrovoc/sparql?query=DESCRIBE <http://aims.fao.org/aos/agrovoc/c_aca7ac6d>

The response header was content-type: text/turtle; charset=utf-8

namedgraph commented 4 years ago

N3 is not a standard syntax.

namedgraph commented 4 years ago

Why is SPARQLWrapper sending triplestore-specific URL parameters such as output and results? Content negotiation should be the default mode, as it is the only standard one.

danielbakas commented 4 years ago

sparql.setOnlyConneg(True) should be the default setting... >:(

nicholascar commented 2 years ago

This issue is now resolved, perhaps due to Fuseki updates, becuase this code now works (with an updated AGROVOC SPARQL endpoint):

from rdflib import Graph

sparql = SPARQLWrapper("https://agrovoc.fao.org/sparql")  # Fuseki 3.6.0 (Fuseki2)

sparql.setQuery("""
    DESCRIBE <http://aims.fao.org/aos/agrovoc/c_aca7ac6d>
""")

sparql.setReturnFormat(TURTLE)
sparql.setOnlyConneg(True)
results = sparql.query().convert()
g = Graph()
g.parse(data=results, format="turtle")
print(g.serialize())

sparql.setOnlyConneg(True) must be included.