RDFLib / sparqlwrapper

A wrapper for a remote SPARQL endpoint
https://sparqlwrapper.readthedocs.io/
Other
520 stars 122 forks source link

Better debug support for JSON decoding errors #171

Open WolfgangFahl opened 3 years ago

WolfgangFahl commented 3 years ago

since i couldn't make anything of the Exception

Expecting property name enclosed in double quotes: line 406865 column 8 (char 11883759)

I got when running a SPARQL query against a Wikidata endpoint i modified the _convertJSON function to allow to inspect the Json String that is the culprit:

def _convertJSON(self):
        """
        Convert a JSON result into a Python dict. This method can be overwritten in a subclass
        for a different conversion method.

        :return: converted result.
        :rtype: dict
        """
        jsonStr=self.response.read().decode("utf-8")
        try:
            return json.loads(jsonStr)
        except json.decoder.JSONDecodeError as jde:
            jsonFileName="/tmp/sparqlerror.json"
            with open(jsonFileName,"w") as jsonFile:
                        jsonFile.write(jsonStr)
            raise jde

the inspection of the /tmp/sparqlerror.json file revealed:

tail -120 sparqlerror.json
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q183"
      },
      "cityId" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q98225433"
      }
    }, {
      "city" : {
        "xml:lang" : "en",
        "type" : "literal",
        "value" : "Cultural heritage D-4-5834-0143 in Weißenbrunn"
      },
      "cityCoord" : {
        "datatype" : "http://www.opengis.net/ont/geosparql#wktLiteral",
        "type" : "literal",
    SPARQL-QUERY: queryStr=
...

java.util.concurrent.TimeoutException
    at java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292)

So it looks that there is a timeout and the timeout information is written directly into the json stream and not catched in any other way by the library. So it would be good to be able to inspect the string causing the json decode to choke as a standard feature.

The query that caused this behavior is shown below. It is run "per region" for our purpose and is successful in some 3000 + cases for most of the regions of the world. But for some regions with a very high number of human settlements that are known to wikidata the timeout was observed by us: regionId regionIsoCode # of settlements
Q980 DE-BY ?
Q18677983 FR-GES 19345
Q21 GB-ENG ?
Q1356 IN-WB 42346

For FR-GES and IN-WB a second attempt was successful while the other two seem to systematically timeout.

# get cities by region for geograpy3
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>

SELECT distinct (?cityQ as ?cityId) ?city ?geoNameId ?gndId ?regionId ?countryId ?cityCoord ?cityPopulation WHERE { 
  VALUES ?hsType {
      wd:Q1549591 wd:Q3957 wd:Q5119 wd:Q15284 wd:Q62049 wd:Q515 wd:Q1637706 wd:Q1093829 wd:Q486972 wd:Q532
  }

  VALUES ?region {
         wd:Q980
  }

  # region the city should be in
  ?cityQ wdt:P131* ?region.
  # type of human settlement to try
  ?hsType ^wdt:P279*/^wdt:P31 ?cityQ.

  # label of the City
  ?cityQ rdfs:label ?city filter (lang(?city) = "en").

  # geoName Identifier
  OPTIONAL {
      ?cityQ wdt:P1566 ?geoNameId.
  }

  # GND-ID
  OPTIONAL { 
      ?cityQ wdt:P227 ?gndId. 
  }

  OPTIONAL{
     ?cityQ wdt:P625 ?cityCoord .
  }

  # region this city belongs to
  OPTIONAL {
    ?cityQ wdt:P131 ?regionId .     
  }

  OPTIONAL {
     ?cityQ wdt:P1082 ?cityPopulation
  }

  # country this city belongs to
  OPTIONAL {
      ?cityQ wdt:P17 ?countryId .
  }
}

try it!

I think the library should have a built-in option to analyze the json result further. The hint This method can be overwritten in a subclass for a different conversion method. already gives a hint that a specialized version of the standard approach migh be possible. How would the overriding be achieved?

I am willing to create a pullrequest based on the results of the discussion of this issue.

WolfgangFahl commented 3 years ago

The same issue arises if an endpoint is not properly set e.g. if you try to use https://query.wikidata.org/ as an endpoint which will return HTML code and not JSON.