apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.12k stars 652 forks source link

GH-2821: Support for Timeouts on Updates #2822

Closed Aklakan closed 1 week ago

Aklakan commented 2 weeks ago

GitHub issue resolved #2821

Pull request Description:


By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

afs commented 2 weeks ago

This PR is a good step forward.

This PR is really two things:

The first is as little as move one line fix :-)

Having timeouts on update is good to have and the way it is done in this PR, timing out the WHERE clause of a modify DELETE-INSERT-WHERE, seems the best way (I'd played around with update timeout ... but stalled on that) because it protects against long-running WHERE.

There are various operations for updates and a general timeout on an update isn't a one change in one place. Update can be several operations all in one request so there could an overall timeout (caveat the issue of various operations).

The PR does impact on applications. This is not to say I think that it is bad (or good) idea but to draw out a consequence.

If I've read the PR with it comes down to this one line: pass the context to the WHERE evaluation. If a global timeout for queries is set, then that will be picked up by the update execution where it wasn't 5.2 and earlier.

For Fuseki, I would guess that timeout is usually per-endpoint; it is possible to set it database or server wide as well.

For an application using Jena as a library, an application wide timeout might be present and that would impact updates with this PR.

Aklakan commented 2 weeks ago

If a global timeout for queries is set, then that will be picked up by the update execution where it wasn't 5.2 and earlier.

cxt.set(ARQ.queryTimeout, cxt.get(ARQ.updateTimeout))

I'd say that the basic updateTimeout setting should affect the overall timeout of an update request - not per sub-update - so the remaining time would have to be passed on to the query exec. Hm, in this case the context would have to be mutated or copied.

The suggestions should be backward compatible with the existing behavior (but may require a change for computing the effective timeout in the query exec).

Aklakan commented 2 weeks ago

I have consolidated the timeout logic (initial and overall) into the Timeouts class. There is now ARQ.updateTimeout. The UpdateEngineWorker tracks its own elapsed processing time and updates its context with the remaining query time out before starting the underlying query execution.

The update timeouts are now only used to update ARQ.queryTimeout in the context. If no update timeout is set then the UpdateEngineWorker will unset the query timeout which should be consistent with the previous behavior.

Conversely, the update timeouts do not affect the sinks: In principle a sink could also refuse to accept changes once the timeout is reached - but if that's really needed it could be a future update.

Aklakan commented 1 week ago

I updated this PR's summary in the first post. The main changes are the introduction of ARQ.updateTimeout and the additional cancel check based on Thread.interrupted() in QueryIteratorBase.requestingCancel().

It's ready for another review.

Aklakan commented 1 week ago

I removed the two unused imports.

afs commented 1 week ago

Thank you!