Swirrl / drafter

A clojure service and a client to it for exposing data management operations to PMD
Other
0 stars 0 forks source link

Don't move incomplete files to the cache #261

Closed danmidwood closed 6 years ago

danmidwood commented 6 years ago

Issue in investigation...

We see various problems reading state graph query results from the cache. It looks like we might be inserting incomplete cache entries, which then cause an exception at the point of reading.

This is logical, because we move the file from tmp to the cache on stream close, and we don't have any validation in there to ensure that the cache entry is complete.

In a case that we encounter a problem and call stream.close without having processed the entire stream then we would still continue to insert the cache entry, even though we have not fully populated the file.

We can also experience this undesired behaviour in non-exceptional cases, such as any time we close the stream before fully consuming all of the results. I'm not sure if we do do this, but it's something worth considering in the overall fix.

The exception we see reading from the cache:

2018-05-28 15:21:15,570 DEBUG              stasher dan@swirrl.com              req-0ff91e84 :: Found entry in cache for :tuple query
2018-05-28 15:21:15,571 ERROR               errors                    req-0ff91e84 :: There was an unknown error.  Returning 500
clojure.lang.ExceptionInfo: Error waiting on results {:prepared-query #object[drafter.stasher.proxy$org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery$ff19274a 0x3761fe05 "SELECT * WHERE { GRAPH <http://publishmydata.com/graphs/drafter/drafts> { ?ds <http://purl.org/dc/terms/created> ?created .?ds <http://purl.org/dc/terms/modified> ?modified .?ds <http://purl.org/dc/terms/creator> ?creator .OPTIONAL { ?ds <http://www.w3.org/2000/01/rdf-schema#comment> ?description . }OPTIONAL { ?ds <http://publishmydata.com/def/drafter/hasOwner> ?owner . }OPTIONAL { ?ds <http://www.w3.org/2000/01/rdf-schema#label> ?title . }OPTIONAL { ?ds <http://publishmydata.com/def/drafter/submittedBy> ?submitter. }OPTIONAL {  ?ds <http://publishmydata.com/def/drafter/hasSubmission> ?submission .  ?submission <http://publishmydata.com/def/drafter/claimUser> ?claimuser .}OPTIONAL {  ?ds <http://publishmydata.com/def/drafter/hasSubmission> ?submission .  ?submission <http://publishmydata.com/def/drafter/claimRole> ?role .}{  SELECT DISTINCT ?ds WHERE {  ?ds <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://publishmydata.com/def/drafter/DraftSet> .  { ?ds <http://publishmydata.com/def/drafter/hasOwner> <mailto:dan@swirrl.com> . }  }} }}"]}
    at clojure.core$ex_info.invokeStatic(core.clj:4739)
    at clojure.core$ex_info.invoke(core.clj:4739)
    at grafter.rdf4j.repository$sesame_results__GT_seq$pull_query__8496.invoke(repository.clj:404)
    at grafter.rdf4j.repository$sesame_results__GT_seq.invokeStatic(repository.clj:405)
    at grafter.rdf4j.repository$sesame_results__GT_seq.invoke(repository.clj:386)
    at grafter.rdf4j.repository$fn__8523.invokeStatic(repository.clj:419)
    at grafter.rdf4j.repository$fn__8523.invoke(repository.clj:412)
    at grafter.rdf4j.repository$fn__8510$G__8505__8515.invoke(repository.clj:407)
    at grafter.rdf4j.repository$fn__8557.invokeStatic(repository.clj:490)
    at grafter.rdf4j.repository$fn__8557.invoke(repository.clj:479)
    at grafter.rdf.protocols$fn__7372$G__7367__7381.invoke(protocols.clj:63)
    at grafter.rdf4j.repository$query.invokeStatic(repository.clj:619)
    at grafter.rdf4j.repository$query.doInvoke(repository.clj:580)
    at clojure.lang.RestFn.invoke(RestFn.java:425)
    at clojure.lang.AFn.applyToHelper(AFn.java:156)
    at clojure.lang.RestFn.applyTo(RestFn.java:132)
    at clojure.core$apply.invokeStatic(core.clj:659)
    at clojure.core$apply.invoke(core.clj:652)
    at medley.core$mapply.invokeStatic(core.clj:152)
    at medley.core$mapply.doInvoke(core.clj:145)
    at clojure.lang.RestFn.invoke(RestFn.java:464)
    at drafter.rdf.sparql$eager_query.invokeStatic(sparql.clj:21)
    at drafter.rdf.sparql$eager_query.invoke(sparql.clj:10)
    at drafter.rdf.sparql$eager_query.invokeStatic(sparql.clj:18)
    at drafter.rdf.sparql$eager_query.invoke(sparql.clj:10)
    at drafter.backend.draftset.operations$get_all_draftsets_properties_by.invokeStatic(operations.clj:260)
    at drafter.backend.draftset.operations$get_all_draftsets_properties_by.invoke(operations.clj:258)
    at drafter.backend.draftset.operations$get_all_draftsets_by.invokeStatic(operations.clj:267)
    at drafter.backend.draftset.operations$get_all_draftsets_by.invoke(operations.clj:266)
    at drafter.feature.draftset.list$get_draftsets_owned_by.invokeStatic(list.clj:59)
    at drafter.feature.draftset.list$get_draftsets_owned_by.invoke(list.clj:58)
    at drafter.feature.draftset.list$get_draftsets_handler$fn__17549.invoke(list.clj:71)
    at drafter.middleware$optional_enum_param$fn__10162$invoke_inner__10164.invoke(middleware.clj:119)
    at drafter.middleware$optional_enum_param$fn__10162.invoke(middleware.clj:123)
    at drafter.middleware$require_authenticated$fn__10103.invoke(middleware.clj:66)
    at buddy.auth.middleware$wrap_authentication$fn__6632.invoke(middleware.clj:59)
    at buddy.auth.middleware$wrap_authorization$fn__6639.invoke(middleware.clj:110)
    at drafter.routes.draftsets_api$make_route$fn__17717.invoke(draftsets_api.clj:55)
    at compojure.core$make_route$fn__9574.invoke(core.clj:130)
    at compojure.core$wrap_route_middleware$fn__9570.invoke(core.clj:118)
    at compojure.core$if_route$fn__9540.invoke(core.clj:41)
    at compojure.core$if_method$fn__9532.invoke(core.clj:27)
    at compojure.core$routing$fn__9581.invoke(core.clj:144)
    at clojure.core$some.invokeStatic(core.clj:2693)
    at clojure.core$some.invoke(core.clj:2684)
    at compojure.core$routing.invokeStatic(core.clj:144)
    at compojure.core$routing.doInvoke(core.clj:141)
    at clojure.lang.RestFn.applyTo(RestFn.java:139)
    at clojure.core$apply.invokeStatic(core.clj:659)
    at clojure.core$apply.invoke(core.clj:652)
    at compojure.core$routes$fn__9585.invoke(core.clj:149)
    at compojure.core$routing$fn__9581.invoke(core.clj:144)
    at clojure.core$some.invokeStatic(core.clj:2693)
    at clojure.core$some.invoke(core.clj:2684)
    at compojure.core$routing.invokeStatic(core.clj:144)
    at compojure.core$routing.doInvoke(core.clj:141)
    at clojure.lang.RestFn.invoke(RestFn.java:423)
    at drafter.routes.draftsets_api$draftset_api_routes$fn__17997.invoke(draftsets_api.clj:264)
    at compojure.core$if_context$fn__9605.invoke(core.clj:211)
    at compojure.core$routing$fn__9581.invoke(core.clj:144)
    at clojure.core$some.invokeStatic(core.clj:2693)
    at clojure.core$some.invoke(core.clj:2684)
    at compojure.core$routing.invokeStatic(core.clj:144)
    at compojure.core$routing.doInvoke(core.clj:141)
    at clojure.lang.RestFn.applyTo(RestFn.java:139)
    at clojure.core$apply.invokeStatic(core.clj:659)
    at clojure.core$apply.invoke(core.clj:652)
    at compojure.core$routes$fn__9585.invoke(core.clj:149)
    at ring.middleware.resource$wrap_resource$fn__15248.invoke(resource.clj:37)
    at ring.middleware.verbs$wrap_verbs$fn__17306.invoke(verbs.clj:27)
    at swirrl_server.errors$wrap_encode_errors$fn__8930.invoke(errors.clj:45)
    at drafter.middleware$wrap_total_requests_counter$fn__10204.invoke(middleware.clj:220)
    at swirrl_server.middleware.log_request$log_request$fn__17340.invoke(log_request.clj:41)
    at noir.util.middleware$wrap_request_map$fn__17246.invoke(middleware.clj:39)
    at ring.middleware.keyword_params$wrap_keyword_params$fn__15070.invoke(keyword_params.clj:36)
    at ring.middleware.multipart_params$wrap_multipart_params$fn__15228.invoke(multipart_params.clj:172)
    at ring.middleware.params$wrap_params$fn__13327.invoke(params.clj:67)
    at ring.middleware.absolute_redirects$wrap_absolute_redirects$fn__15310.invoke(absolute_redirects.clj:36)
    at ring.middleware.content_type$wrap_content_type$fn__15292.invoke(content_type.clj:34)
    at ring.middleware.not_modified$wrap_not_modified$fn__15281.invoke(not_modified.clj:53)
    at hiccup.middleware$wrap_base_url$fn__15368.invoke(middleware.clj:12)
    at ring.middleware.format_params$wrap_format_params$fn__16378.invoke(format_params.clj:118)
    at ring.middleware.format_params$wrap_format_params$fn__16378.invoke(format_params.clj:119)
    at ring.middleware.format_response$wrap_format_response$fn__17139.invoke(format_response.clj:194)
    at noir.validation$wrap_noir_validation$fn__15416.invoke(validation.clj:155)
    at noir.cookies$noir_cookies$fn__15461.invoke(cookies.clj:72)
    at ring.middleware.cookies$wrap_cookies$fn__14903.invoke(cookies.clj:175)
    at noir.session$noir_flash$fn__15536.invoke(session.clj:158)
    at ring.middleware.flash$wrap_flash$fn__14042.invoke(flash.clj:39)
    at noir.session$noir_session$fn__15520.invoke(session.clj:109)
    at ring.middleware.session$wrap_session$fn__14994.invoke(session.clj:108)
    at ring.middleware.resource$wrap_resource$fn__15248.invoke(resource.clj:37)
    at ring.middleware.file_info$wrap_file_info$fn__17293.invoke(file_info.clj:73)
    at ring.adapter.jetty$proxy_handler$fn__12229.invoke(jetty.clj:25)
    at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$ff19274a.handle(Unknown Source)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:499)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:748)
Caused by: clojure.lang.ExceptionInfo: Error reading results {:prepared-query #object[drafter.stasher.proxy$org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery$ff19274a 0x3761fe05 "SELECT * WHERE { GRAPH <http://publishmydata.com/graphs/drafter/drafts> { ?ds <http://purl.org/dc/terms/created> ?created .?ds <http://purl.org/dc/terms/modified> ?modified .?ds <http://purl.org/dc/terms/creator> ?creator .OPTIONAL { ?ds <http://www.w3.org/2000/01/rdf-schema#comment> ?description . }OPTIONAL { ?ds <http://publishmydata.com/def/drafter/hasOwner> ?owner . }OPTIONAL { ?ds <http://www.w3.org/2000/01/rdf-schema#label> ?title . }OPTIONAL { ?ds <http://publishmydata.com/def/drafter/submittedBy> ?submitter. }OPTIONAL {  ?ds <http://publishmydata.com/def/drafter/hasSubmission> ?submission .  ?submission <http://publishmydata.com/def/drafter/claimUser> ?claimuser .}OPTIONAL {  ?ds <http://publishmydata.com/def/drafter/hasSubmission> ?submission .  ?submission <http://publishmydata.com/def/drafter/claimRole> ?role .}{  SELECT DISTINCT ?ds WHERE {  ?ds <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://publishmydata.com/def/drafter/DraftSet> .  { ?ds <http://publishmydata.com/def/drafter/hasOwner> <mailto:dan@swirrl.com> . }  }} }}"]}
    at clojure.core$ex_info.invokeStatic(core.clj:4739)
    at clojure.core$ex_info.invoke(core.clj:4739)
    at grafter.rdf4j.repository$sesame_results__GT_seq$pull_query__8496$fn__8497.invoke(repository.clj:398)
    at grafter.rdf4j.repository$sesame_results__GT_seq$pull_query__8496.invoke(repository.clj:394)
    ... 100 more
Caused by: java.lang.NullPointerException
    at grafter.rdf4j.repository$query_bindings__GT_map$fn__8458.invoke(repository.clj:299)
    at clojure.core$map$fn__5587.invoke(core.clj:2745)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.boundedLength(RT.java:1785)
    at clojure.lang.RestFn.applyTo(RestFn.java:130)
    at clojure.core$apply.invokeStatic(core.clj:657)
    at clojure.core$mapcat.invokeStatic(core.clj:2775)
    at clojure.core$mapcat.doInvoke(core.clj:2775)
    at clojure.lang.RestFn.invoke(RestFn.java:423)
    at grafter.rdf4j.repository$query_bindings__GT_map.invokeStatic(repository.clj:298)
    at grafter.rdf4j.repository$query_bindings__GT_map.invoke(repository.clj:295)
    at grafter.rdf4j.repository$sesame_results__GT_seq$pull_query__8496$fn__8497.invoke(repository.clj:395)
    ... 101 more
2018-05-28 15:21:15,573 INFO           log-request                    req-0ff91e84 :: RESPONSE  500 finished.  It took 8ms to execute

Which goes into this code block and throw the exception on the converter-f line

(if (.hasNext results)
    (let [current-result (try
                                          (converter-f (.next results))
                                          (catch Exception e
                                              (.close results)
                                              (throw (ex-info "Error reading results" {:prepared-query prepared-query} e))))]
        (lazy-cat
            [current-result]
            (pull-query)))
  (.close results))
RickMoynihan commented 6 years ago

Interesting, this is on the "pull" interface; but does it happen on the push interface? I seem to recall there being code that handled things there differently.

Generally speaking there is a fundamental problem here, which is that many of the RDF serialization formats don't let you know when you have the complete document. i.e. if when parsing n-triples, n-quads, or turtle (and probably some other formats) the underlying stream is closed early a client can't tell whether it has the complete data or if the connection was just closed early, which just happened to happen at a valid end point, e.g.

<http://foo> <http://foo> <http://foo> .
<http://bar> <http://bar> <http://bar> . |CONNECTION DIES|

The client can't tell that there should have been more triples.

Some formats are more robust against this, i.e. JSON/XML based ones will have a closing tag which should only be sent when the server has actually finished. Though we can't use XML for constructs because it can't actually represent all valid RDF graphs.

We might be able to resolve some of this issue by configuring these whitelists for content neg. However the issue may remain somewhat inside the pull interface because it knows nothing of the underlying format. IIRC the pull interface should make a

We can also experience this undesired behaviour in non-exceptional cases, such as any time we close the stream before fully consuming all of the results. I'm not sure if we do do this, but it's something worth considering in the overall fix.

FYI Drafter should always eagerly consume the query results it makes internally, as evidenced by the stack frame at drafter.rdf.sparql$eager_query.invokeStatic(sparql.clj:21), so it should be safe to make this assumption.

Regardless of the above I think this is a problem, but there is no good solution available whilst we use the RDF4j pull interface to evaluate. Instead I think we should look at changing the grafter function:

https://github.com/swirrl/grafter/blob/rdf4j-temp/src/grafter/rdf4j/repository.clj#L386-L405

To instead call (.evaluate prepared-query handler); by passing a custom handler here we should then get a little finer granularity on the underlying events. Obviously if we were to do this we'd also need to preserve the lazy-seq return value interface, so we'd have to marshal the events back via a channel. We do this elsewhere in grafter with this pipe function:

https://github.com/swirrl/grafter/blob/rdf4j-temp/src/grafter/rdf4j/io.clj#L388-L400

I'd suggest we have a chat tomorrow about what exactly we want to do here, as it's a little involved.

danmidwood commented 6 years ago

There shouldn't be any issue on the push side, in that situation we only move the file to the cache after the pusher explicitly says that it is done (endRDF or endQueryResult).

We'll pick up the rest tomorrow when you're back in the office

RickMoynihan commented 6 years ago

Yeah, and that's basically what I'm suggesting is that I think the best solution is to somehow make these pull things use push underneath.

danmidwood commented 6 years ago

This NPE turned out to have a different cause, which is differences in result sets and bound column names between reading from StarDog and reading from the cache. Full explanation in #264.

danmidwood commented 6 years ago

I'm closing this now #264 has been fixed and pushed