danielytics commented 9 years ago

55 brought up the issue of handling errors asynchronously.

Ideally, I would like if all errors (we need to investigate what errors can occur) should be put on an error channel. The error data should include:

an error type (keyword?)
the query that caused it (if available)
the connection
the token and/or results channel of the query (if available)
if the error was caused by an exception (in the send/recv loops) then the exception should also be included.

To be able to "recover" from the error, two things need to exist:

A mechanism for resubmitting the query with the same token or results channel (that is, a way to transparently retry the query)
A mechanism for cancelling the query (that is, a way to notify the caller of the query that results will never arrive - could be as simple as closing the result channel for the token)

It should be clearly documented what happens to the connections state (token, result channel) if the error is silently ignored.

danielytics commented 9 years ago

Given the above recovery modes, auto-reconnection logic could look something like this:

(go-loop []
  (when-let [error (<! error-channel)]
    (condp = (:type error)
      :disconnected (do (r/reconnect! (:connection error) ;; Ok, this would need to be implemented!
                        (r/resubmit-query! error))
     ; other errors here

The new functions here are:

(fn reconnect! [connection] ...) - should re-establish a connection, mutating the original connection so that its transparent to code that uses (run query connection)
(fn resubmit-query! m) - where m is a map with keys connection, token (or results channel?) and query and it will rerun the query on the connection in such a way (eg same token) that the same results channel receives the results

That's just my first thoughts on it. There are probably better ways still :)

danielcompton commented 9 years ago

I need to think about this a bit more, but my first thoughts are:

The token needs to be unique to RethinkDB for each query. I'm not sure what would happen if you resubmitted a new query with the same token as an old one, particularly one that had thrown an error, but I suspect it falls squarely into the bounds of undefined behaviour.
You could rerun the query, passing in the previous query's result-channel as the new channel that you're using. That would preclude closing the result channel if an error happened though, or at least would require a parameter for it. You also need to be careful that there aren't still values from the previous query left in the result-channel.

Totally on board with the idea of putting errors on the channel though.

danielcompton commented 9 years ago

Another approach is to put errors onto the same channel and the user is responsible for handling them all in the same codepath: https://github.com/mpenet/alia/blob/master/src/qbits/alia.clj#L399-L437.

danielcompton commented 9 years ago

In the spirit of solving all problems with one more layer of indirection, you could have multiple unique RethinkDB query tokens routed to the same core.async channel by sharing a query-id. Not sure if this is a great approach though.

I have the feeling that we should be able to provide good primitives for running queries that can be composed into a higher level "persistent query" which can gracefully recover from errors. This would probably be preferable over building reconnection into the lower level protocols or reusing channels or tokens.

danielytics commented 9 years ago

I referred to token, but on the client side, the token is only used to look up the channel. Can use the channel directly (which has the added benefit that we can remove it from the connections pending list and re-add if resubmitted).

danielytics commented 9 years ago

I don't think closing the results-channel (manually if you choose to do so, in the error handler) is a problem. If there are items still on it, iirc they are still accessible until the channel is emptied. Of course, resending AND closing isn't valid. I'm not sure if we could detect this? I don't think we should ever be automatically closing the channel.

I'm not sure what the solution is for partial results. I don't think there's much we can do about any results already on the channel and am not sure what the right thing to do is if a query is rerun when it already received partial results. Unless we do a lot of magic to not resubmit the same results, but that would be difficult and error prone. Something to think on for a while, I guess.

I don't like the approach of putting errors on the same channel as results. I'd have to check every result manually then. With a single error channel at least my error handling code is in one place. With an error channel you can always manually set up sending errors to the results channel if the error contains the channel - you just copy it on. Therefore a single error channel is a lower level thing that can be used to build higher level things. Having said that, if you or @apa512 prefer to send errors on the results channel, I'd be ok with it.

If the user can supply the results channel (rather than clj-rethinkdb creating one for each query) then you could share a result channel between queries. I'm not sure what the use case is though (and you can always mix channels anyway).

I agree re: primitives. If we can come up with a set of low level constructs on top of which to build auto-retry and whatever else, that would be preferred.

apa512 / clj-rethinkdb

Error handling #66

55 brought up the issue of handling errors asynchronously.