bchavez / RethinkDb.Driver

:headphones: A NoSQL C#/.NET RethinkDB database driver with 100% ReQL API coverage.
http://rethinkdb.com/api/java
Other
383 stars 134 forks source link

Changefeed cursor throws NotSupportedException when a table is unhealthy #138

Closed wjrogers closed 5 years ago

wjrogers commented 5 years ago

Scenario:

The driver throws:

System.NotSupportedException: Cursor cannot extend the response. The response was not a 
SUCCESS_PARTIAL or SUCCESS_SEQUENCE.
   at RethinkDb.Driver.Net.Cursor`1.ExtendBuffer(Response response)
   at RethinkDb.Driver.Net.Cursor`1.<MoveNextAsync>d__24.MoveNext()

Having trouble reproducing the exact circumstances in which the driver threw this exception, but I believe it was the combination of changefeed on proxy server + main data node rebooting that lead to the (unexpected?) server response.

Will you consider throwing a different exception here? It was a transient failure, and I find myself writing a special error-handling case because I normally would allow a NotSupportedException to crash my process.

Client: RethinkDb.Driver 2.3.23, .NET Framework 4.7.1, Windows Server 2016 Server: RethinkDB 2.3.6 (official Docker image)

bchavez commented 5 years ago

Hi Will,

Thank you for the report. I appreciate it.

Changing the exception type that is thrown would be a breaking change. I'm okay with breaking changes so long as we document the breaking change very clearly in release notes and whatever step we take forward is more correct than our last step. I agree using NotSupportedException was probably a bit sloppy and lazy on my part.

However, before we take any step forward, I need to know what the response was that caused the exception. There could be a chance that there is a more detailed exception that can be thrown if the response is passed through the ErrorBuilder. But to determine viability, I need the response that caused the driver to throw.

If you don't have the exact response, we have a couple of options:

Let me know.

Thanks,
Brian

:car: :blue_car: "Let the good times roll..."

bchavez commented 5 years ago

Hey Will,

I was able to reproduce the NotSupportedException in midst of an in-flight async cursor MoveNextAsync call by rebooting one of the cluster nodes (Linux A-C) behind a RethinkDB proxy (Linux D) as you've described. I tested the behavior using my Linux test cluster.

screen_2775

The protocol trace when the event occurs:

TRACE JSON Recv: Token: 1, JSON: {"t":18,"e":4100000,"r":["Changefeed aborted (unavailable)."],"b":[]}
TRACE JSON Send: Token: 1, JSON: [2]

And as suspected, {"t":18 is a RUNTIME_ERROR response that we can parse with ErrorBuilder. And, "e":4100000 is an OP_FAILED that should throw a ReqlOpFailedError exception when this transient error occurs.

Alright, hang tight. Now that I'm confident with what is happening, I'll have the driver changes out soon.

:fire: :volcano: "We're building it up, to break it back down..."

bchavez commented 5 years ago

Hi Will,

The driver changes are now live in v2.3.24: https://www.nuget.org/packages/RethinkDb.Driver/2.3.24

The exception type you want to look for when a host is rebooted behind a proxy is ReqlOpFailedError exception.

Let me know how it goes.

Thanks, Brian

:walking: :sunglasses: "So don't delay... act now supplies are running out..."

wjrogers commented 5 years ago

Thank you for the quick fix!

bchavez commented 5 years ago

Hi Will,

You're very welcome. It makes me happy I was able to help you resolve the issue. Couldn't have done it without your hint about restarting nodes behind a proxy. :smiley: :+1: And thanks for using the RethinkDB driver!