RDFLib / sparqlwrapper

A wrapper for a remote SPARQL endpoint
https://sparqlwrapper.readthedocs.io/
Other
520 stars 122 forks source link

Test is too slow #177

Closed eggplants closed 2 years ago

eggplants commented 2 years ago

Some endpoints, like https://live.dbpedia.org/sparql and https://dbpedia.org/sparql, is too slow and unit tests take also too much time to finish due to them. On the other hand, https://ja.dbpedia.org/sparql is too fast to get a response. I think this difference is caused by https://ja.dbpedia.org's resource size that is maybe smaller than https://dbpedia.org's one. So I suggest endpoint for test changes into https://ja.dbpedia.org or more smaller DB endpoint.

eggplants commented 2 years ago
$ grep -r '"http' test/*py
test/4store__v1_1_5__agroportal__test.py:endpoint = "http://sparql.agroportal.lirmm.fr/sparql/"
test/agrovoc-allegrograph_on_hold.py:endpoint = "http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc"
test/allegrograph__v4_14_1__mmi__test.py:endpoint = "https://mmisw.org/sparql"
test/blazegraph__wikidata__test.py:endpoint = "https://query.wikidata.org/sparql"
test/fuseki2__v3_6_0__agrovoc__BROKEN.py:endpoint = "http://agrovoc.uniroma2.it:3030/agrovoc/sparql"
test/fuseki2__v3_8_0__stw__test.py:endpoint = "http://zbw.eu/beta/sparql/stw/query"
test/graphdbEnterprise__v8_9_0__rs__BROKEN.py:endpoint = "http://rs.ontotext.com/repositories/ff-news"
test/lov-fuseki_on_hold.py:endpoint = "https://lov.linkeddata.es/dataset/lov/sparql/"
test/rdf4j__geosciml__test.py:endpoint = "http://vocabs.ands.org.au/repository/api/sparql/csiro_international-chronostratigraphic-chart_2018-revised-corrected"
test/stardog__lindas__test.py:endpoint = "https://lindas.admin.ch/query"
test/virtuoso__v7_20_3230__dbpedia__test.py:endpoint = "http://dbpedia.org/sparql"
test/virtuoso__v8_03_3313__dbpedia__test.py:endpoint = "https://live.dbpedia.org/sparql"
test/wrapper_test.py:        sparql = SPARQLWrapper("http://example.org/sparql")
eggplants commented 2 years ago
grep -Er 'endpoint = "http[^"]+"' test/*py |
  sed 's/ #.*//' | tr -d '"' |
  awk -F 'py:endpoint = ' '$0=$2' |
  while read i; do
    echo "[$i]"
    time rqw -e "$i" \
             -Q "select ?x {?x ?y ?z} limit 1"
  done |&> log
eggplants commented 2 years ago

Some endpoints are currently dead.

log ```shellsession $ cat log [http://sparql.agroportal.lirmm.fr/sparql/] { "head": { "vars": [ "x" ] }, "results": { "bindings": [ { "x": { "type": "uri", "value": "http://aims.fao.org/aos/agrovoc/xl_de_9a31229d" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.18s user 0.07s system 2% cpu 11.785 total [http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc] Traceback (most recent call last): File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 1346, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1285, in request self._send_request(method, url, body, headers, encode_chunked) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1331, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1280, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1040, in _send_output self.send(msg) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 980, in send self.connect() File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 946, in connect self.sock = self._create_connection( File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/socket.py", line 844, in create_connection raise err File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/socket.py", line 832, in create_connection sock.connect(sa) TimeoutError: [Errno 60] Operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/eggplants/.pyenv/versions/3.9.9/bin/rqw", line 33, in sys.exit(load_entry_point('SPARQLWrapper', 'console_scripts', 'rqw')()) File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/main.py", line 111, in main results = sparql.query().convert() File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/Wrapper.py", line 789, in query return QueryResult(self._query()) File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/Wrapper.py", line 755, in _query response = urlopener(request) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 214, in urlopen return opener.open(url, data, timeout) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 517, in open response = self._open(req, data) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 534, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 494, in _call_chain result = func(*args) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 1375, in http_open return self.do_open(http.client.HTTPConnection, req) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 1349, in do_open raise URLError(err) urllib.error.URLError: rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.19s user 0.07s system 0% cpu 1:15.29 total [https://mmisw.org/sparql] { "head": { "vars": [ "x" ] }, "results": { "bindings": [ { "x": { "type": "uri", "value": "https://mmisw.org/ont/~mjuckes/cmip_variables_alpha/rsdcs4co2" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.22s user 0.10s system 44% cpu 0.711 total [https://query.wikidata.org/sparql] { "head": { "vars": [ "x" ] }, "results": { "bindings": [ { "x": { "type": "uri", "value": "http://wikiba.se/ontology#Dump" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.22s user 0.09s system 47% cpu 0.638 total [http://agrovoc.uniroma2.it:3030/agrovoc/sparql] Traceback (most recent call last): File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 1346, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1285, in request self._send_request(method, url, body, headers, encode_chunked) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1331, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1280, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 1040, in _send_output self.send(msg) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 980, in send self.connect() File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/http/client.py", line 946, in connect self.sock = self._create_connection( File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/socket.py", line 844, in create_connection raise err File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/socket.py", line 832, in create_connection sock.connect(sa) TimeoutError: [Errno 60] Operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/eggplants/.pyenv/versions/3.9.9/bin/rqw", line 33, in sys.exit(load_entry_point('SPARQLWrapper', 'console_scripts', 'rqw')()) File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/main.py", line 111, in main results = sparql.query().convert() File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/Wrapper.py", line 789, in query return QueryResult(self._query()) File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/Wrapper.py", line 755, in _query response = urlopener(request) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 214, in urlopen return opener.open(url, data, timeout) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 517, in open response = self._open(req, data) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 534, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 494, in _call_chain result = func(*args) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 1375, in http_open return self.do_open(http.client.HTTPConnection, req) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 1349, in do_open raise URLError(err) urllib.error.URLError: rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.22s user 0.09s system 0% cpu 1:15.32 total [http://zbw.eu/beta/sparql/stw/query] { "head": { "vars": [ "x" ] }, "results": { "bindings": [ { "x": { "type": "uri", "value": "http://www.w3.org/2004/02/skos/core" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.21s user 0.09s system 33% cpu 0.905 total [http://rs.ontotext.com/repositories/ff-news] Traceback (most recent call last): File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/Wrapper.py", line 755, in _query response = urlopener(request) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 214, in urlopen return opener.open(url, data, timeout) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 523, in open response = meth(req, response) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 632, in http_response response = self.parent.error( File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 561, in error return self._call_chain(*args) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 494, in _call_chain result = func(*args) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/urllib/request.py", line 641, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 500: During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/eggplants/.pyenv/versions/3.9.9/bin/rqw", line 33, in sys.exit(load_entry_point('SPARQLWrapper', 'console_scripts', 'rqw')()) File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/main.py", line 111, in main results = sparql.query().convert() File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/Wrapper.py", line 789, in query return QueryResult(self._query()) File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/Wrapper.py", line 767, in _query raise EndPointInternalError(e.read()) SPARQLWrapper.SPARQLExceptions.EndPointInternalError: EndPointInternalError: endpoint returned code 500 and response. Response: b'Software license validation has failed!' rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.20s user 0.09s system 27% cpu 1.042 total [https://lov.linkeddata.es/dataset/lov/sparql/] { "head": { "vars": [ "x" ] }, "results": { "bindings": [ { "x": { "type": "uri", "value": "http://www.w3.org/2002/07/owl#someValuesFrom" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.19s user 0.08s system 16% cpu 1.633 total [http://vocabs.ands.org.au/repository/api/sparql/csiro_international-chronostratigraphic-chart_2018-revised-corrected] { "head": { "vars": [ "x" ] }, "results": { "bindings": [ { "x": { "type": "uri", "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.17s user 0.07s system 37% cpu 0.640 total [https://lindas.admin.ch/query] Traceback (most recent call last): File "/Users/eggplants/.pyenv/versions/3.9.9/bin/rqw", line 33, in sys.exit(load_entry_point('SPARQLWrapper', 'console_scripts', 'rqw')()) File "/Users/eggplants/prog/sparqlwrapper/SPARQLWrapper/main.py", line 114, in main print(json.dumps(results, indent=4)) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/json/__init__.py", line 234, in dumps return cls( File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/json/encoder.py", line 201, in encode chunks = list(chunks) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/json/encoder.py", line 438, in _iterencode o = _default(o) File "/Users/eggplants/.pyenv/versions/3.9.9/lib/python3.9/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type bytes is not JSON serializable rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.18s user 0.07s system 20% cpu 1.211 total [http://dbpedia.org/sparql] { "head": { "link": [], "vars": [ "x" ] }, "results": { "distinct": false, "ordered": true, "bindings": [ { "x": { "type": "uri", "value": "http://www.openlinksw.com/virtrdf-data-formats#default-iid" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.20s user 0.07s system 0% cpu 1:09.02 total [http://dbpedia-live.openlinksw.com/sparql] { "head": { "link": [], "vars": [ "x" ] }, "results": { "distinct": false, "ordered": true, "bindings": [ { "x": { "type": "uri", "value": "http://www.openlinksw.com/virtrdf-data-formats#default-iid" } } ] } } rqw -e "$i" -Q "select ?x {?x ?y ?z} limit 1" 0.19s user 0.07s system 14% cpu 1.834 total ```
aucampia commented 2 years ago

I think the right approach here may be to work with docker containers for testing instead of remote endpoints as much as possible, but I agree, they are too slow.

aucampia commented 2 years ago

I think this is still an issue, for me at least test takes a couple of minutes. Should be much quicker.

aucampia commented 2 years ago

Re-opening this, as it is still too slow for comfort. One option may be to use https://github.com/kiwicom/pytest-recording

eggplants commented 2 years ago

http://sparql.agroportal.lirmm.fr/sparql/ is unstable. If can, we have to change it to other stable 4store endpoint.

eggplants commented 2 years ago
$ for i in test/*.py; do
    [[ "$i" =~ 4store ]] && continue
    echo "[$i]"
    time (python -m unittest -q "$i" &>/dev/null)
  done &> log
$ cat log
[test/__init__.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.07s user 0.05s system 86% cpu 0.143 total
[test/agrovoc-allegrograph_on_hold.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.78s user 0.12s system 1% cpu 56.535 total
[test/allegrograph__v4_14_1__mmi__test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  1.27s user 0.18s system 3% cpu 42.402 total
[test/blazegraph__wikidata__test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  7.36s user 0.26s system 2% cpu 5:57.60 total
[test/cli_test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.30s user 0.08s system 23% cpu 1.627 total
[test/fuseki2__v3_6_0__agrovoc__BROKEN.py]
( python3 -m unittest -q "$i" &> /dev/null; )  1.37s user 0.17s system 1% cpu 1:50.61 total
[test/fuseki2__v3_8_0__stw__test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.43s user 0.12s system 0% cpu 5:22.25 total
[test/graphdbEnterprise__v8_9_0__rs__BROKEN.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.41s user 0.11s system 1% cpu 48.960 total
[test/lov-fuseki_on_hold.py]
( python3 -m unittest -q "$i" &> /dev/null; )  1.71s user 0.17s system 1% cpu 2:06.42 total
[test/rdf4j__geosciml__test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.47s user 0.11s system 1% cpu 30.167 total
[test/stardog__lindas__test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.92s user 0.13s system 1% cpu 1:08.37 total
[test/virtuoso__v7_20_3230__dbpedia__test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  5.68s user 0.21s system 0% cpu 23:08.75 total
[test/virtuoso__v8_03_3313__dbpedia__test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  5.75s user 0.23s system 0% cpu 20:39.26 total
[test/wrapper_test.py]
( python3 -m unittest -q "$i" &> /dev/null; )  0.19s user 0.07s system 87% cpu 0.298 total
nicholascar commented 2 years ago

@eggplants please do change to other more stable stores! Although, there must be a stable AgroPortal SPARQL store out there surely? AgroPortal's a large vocab user after all.

eggplants commented 2 years ago
eggplants commented 2 years ago

Does anyone know of any public endpoints that are using Virtuoso version 08? (except http://live.dbpedia.org/sparql)

nicholascar commented 2 years ago

No, sorry! All of the SPARQL stores I implement (about 10 public) use Jena. I'll ask around though

eggplants commented 2 years ago

Due to merge #184, Test time has been reduced to less than 15*4 minutes. Had this issue almost been resolved?

aucampia commented 2 years ago

I would like it even faster, but I guess we can close it and open another, ideally we should be using something like pytest-recording where we can't use test containers, and use test containers otherwise, but there is no rush for this, thank you very much for what you did.

eggplants commented 2 years ago

I am going to convert tests from unittest into pytest. It looks like we can use @pytest.mark.parametrize to commonize tests by endpoint.

aucampia commented 2 years ago

I am going to convert tests from unittest into pytest. It looks like we can use @pytest.mark.parametrize to commonize tests by endpoint.

sounds great, I will close this issue though, we can handle new improvements as they come, but to me I think the tests should run within less than 3 minutes ideally.