epoz / shmarql

SPARQL endpoint explorer
The Unlicense
15 stars 2 forks source link

Odd time out #25

Open ch-sander opened 1 week ago

ch-sander commented 1 week ago

maybe this is a problem of oxigraph and not shmarql? @tpt

Endpoint: https://dataria.org/sparql

Query

SELECT DISTINCT (COUNT(DISTINCT ?object_8) AS ?object_8_count) ?place_4 ?place_4_label WHERE {
  ?place_1 (<http://www.graceful17.org/ontology/falls_within>+) ?place_4.
  ?place_4 <http://www.graceful17.org/ontology/called> ?place_4_label;
    <http://www.graceful17.org/ontology/has_main_type> <http://www.graceful17.org/resources/type_77>.
  ?place_1 ^<http://www.graceful17.org/ontology/primary_place> ?institution_6.
  ?institution_6 <http://www.graceful17.org/ontology/holds_immaterial_object> ?object_8.
  ?object_8 <http://www.graceful17.org/ontology/called> ?object_8_label.
  ?object_8  <http://www.graceful17.org/ontology/has_main_type> <http://www.graceful17.org/resources/type_460>.
}
GROUP BY ?place_4 ?place_4_label
ORDER BY DESC (?object_8_count)
LIMIT 10000

Result

504 Gateway Time-out
nginx/1.22.0 (Ubuntu)

Solution

Avoiding <http://www.graceful17.org/ontology/has_main_type> <http://www.graceful17.org/resources/type_460> or making it OPTIONAL will return a result in 0.3 seconds. It also works with a subquery

      {
    SELECT DISTINCT ?object_8 WHERE {
     ?object_8 <http://www.graceful17.org/ontology/has_main_type> <http://www.graceful17.org/resources/type_460>.
    }
  }

I can DESCRIBE both a ?object_8 and <http://www.graceful17.org/resources/type_460> -- the predicate <http://www.graceful17.org/ontology/has_main_type> exists!

It might be something on my end -- just wanted to check if I need to dig deeper in my data and config or if it could be out of my (immediate) control...

ch-sander commented 1 week ago

Before the timeout the CPU is at 100% (memory is fine), so it's desperately trying to get to the results

Tpt commented 1 week ago

The timeout itself is likely because of the reverse-proxy you are using. There is no timeout in Oxigraph server.

On why the query execution is so slow, it's likely that it is because Oxigraph picks a bad join ordering, exploding the computation time. Subqueries is indeed a way to game the join reordering system.

ch-sander commented 1 week ago

thanks @Tpt . Yes, the timeout is the proxy, but >60s is the issue (even if I set the timeout to 10 minutes).

I can optimize the queries but using the visual query builder sparnatural it's kind of problematic as it constructs the sparql for the user...

probably optimizing "bad join ordering" on oxigraph's end is not an easy fix?

Tpt commented 1 week ago

probably optimizing "bad join ordering" on oxigraph's end is not an easy fix?

It's indeed not an easy fix. It would still be great to put some work in this area because the current reordering algorithm is very bad. However, it's an endless topic, join reordering is a very hard problem you can put dozens of years of work into.