Swirrl / drafter

A clojure service and a client to it for exposing data management operations to PMD
Other
0 stars 0 forks source link

Drafter 2 orders of magnitude slower for simple queries vs raw stardog #541

Closed RickMoynihan closed 3 years ago

RickMoynihan commented 3 years ago

e.g. see these timings running a select limit 1 query, against drafter from my muttnik REPL. NOTE similar perf occurs whether the query is cached or uncached.

Compared to against the raw endpoint, which returns in about 10ms for the same query.

dev> (time (count (grafter-2.rdf4j.repository/query (grafter-2.rdf4j.repository/->connection (drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1)) "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 325.800383 msecs"
;; => 1
dev> (time (count (grafter-2.rdf4j.repository/query (grafter-2.rdf4j.repository/->connection (drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1)) "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 276.139155 msecs"
;; => 1
dev> (time (count (grafter-2.rdf4j.repository/query (grafter-2.rdf4j.repository/->connection (drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1)) "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 650.077309 msecs"
;; => 1
dev> (time (count (grafter-2.rdf4j.repository/query (grafter-2.rdf4j.repository/->connection (drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1)) "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 264.427397 msecs"
;; => 1
dev> (time (count (grafter-2.rdf4j.repository/query (grafter-2.rdf4j.repository/->connection (drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1)) "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 204.679964 msecs"
;; => 1
dev> (time (count (grafter-2.rdf4j.repository/query (grafter-2.rdf4j.repository/->connection (drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1)) "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 297.815884 msecs"
;; => 1
dev> 
callum-oakley commented 3 years ago

The majority of this time appears to be establishing the connection, so as long as we're re-using connections this is perhaps not as bad as it initially seems (do we establish a new connection for every request in muttnik atm? easy change to switch to a connection pool if we don't)

dev=> (time (def conn (grafter-2.rdf4j.repository/->connection (drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1))))
"Elapsed time: 393.692738 msecs"
#'dev/conn
dev=> (time (count (grafter-2.rdf4j.repository/query conn "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 12.728207 msecs"
1
dev=> (time (count (grafter-2.rdf4j.repository/query conn "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 9.744618 msecs"
1
dev=> (time (count (grafter-2.rdf4j.repository/query conn "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 9.940947 msecs"
1
dev=> (time (count (grafter-2.rdf4j.repository/query conn "select ?s where { ?s ?p ?o } limit 1")))
"Elapsed time: 11.426175 msecs"
1
callum-oakley commented 3 years ago

That does of course leave the question of why it takes so long to establish a connection...

RickMoynihan commented 3 years ago

ah yes good point, I should have spotted that, we noticed this a long time ago - it's precisely why we do typically reuse connections in muttnik.

callum-oakley commented 3 years ago

For posterity if anyone stumbles upon this issue again in the future, the slow bit is getting the token:

(defn m2m-triplestore [client job-timeout]
  (println "token:")
  (let [token (:access_token (time (auth/get-client-id-token client)))
        context draftset/live]
    (println "rest:")
    (time (-> client
        (client/with-job-timeout job-timeout)
        (repository token context)
        (triplestore)))))

(drafter-client.triplestore/m2m-triplestore (:drafter-client/client system) 1)
; token:
; "Elapsed time: 469.579769 msecs"
; rest:
; "Elapsed time: 1.216918 msecs"

... which makes perfect sense, since there's a request to auth0 in there.