bchavez / RethinkDb.Driver

:headphones: A NoSQL C#/.NET RethinkDB database driver with 100% ReQL API coverage.
http://rethinkdb.com/api/java
Other
383 stars 134 forks source link

Issue with Change Feeds stopping. #135

Closed S4RD7R closed 6 years ago

S4RD7R commented 6 years ago

Since I can't seem to get in to the RethinkDB Slack Channel and I'm not sure if this is a bug or an Issue I thought I would post my issue here.

I have a Linux VM setup in Azure with RethinkDB 2.3.6 running on it. It seems to be running fine as I've been working with it for a while now.

I am creating a connection like this

conn = RethinkDB.R.Connection().Hostname("*****")
      .Port(RethinkDBConstants.DefaultPort)
      .Timeout(60)
      .Connect();

I'm then getting a Changes feed like this.

return RethinkDB.R.Db(Data.Helpers.DATABASE_NAME)
   .Table(Data.Helpers.WEBSITE_TABLE_NAME)
   .Changes()[new { include_states = true, include_initial = true, include_types = true }]
   .RunChanges<JObject>(Data.Helpers.getConnection());

Once I have the changes feed I'm consuming it like this.

            observable = changes.ToObservable();

            observable.SubscribeOn(NewThreadScheduler.Default)
                .Subscribe(
                    x => OnNext(x, ref onNext),
                    err => OnError(err, ref onError),
                    () => OnCompleted(ref onCompleted)
                );

As far as I'm aware all this is following the documentation and it seems to work well.

My issues start appearing after a few minutes of inactivity, at this point the feeds fail to get any further changes, my main connection appears to not work and needs a reconnect but that doesn't fix the feeds. I just want the feeds to keep watching as long as the app is up.

Is this an issue or am I just doing something silly somewhere.

I'm expecting that one I setup the above I can leave it to tick over is that correct?

Thanks for your help.

bchavez commented 6 years ago

Hi John,

Thank you for writing a detailed issue report. It helps tremendously when trying to debug issues like this.

As for the Slack channel, I created a gitter.im channel for the time being as a backup until the main RethinkDB slack channel opens up again. You can join here https://gitter.im/bchavez/RethinkDb.Driver I'll update the README.md after this post.

At first glance, I don't immediately see any problems with your code or your approach. The symptoms feel more like an underlying network issue between the client driver and the server. Full Disclosure: I don't have much experience with Azure environments. Maybe @cecilphillip can provide more insight; he's a CDA for Microsoft and works with Azure. I know we have people running apps using this driver on Azure successfully; so it could be an Azure setup thing.

Here are my initial thoughts on debugging your particular issue:

  1. Are there any load balancers between the client machine and the server? If so, try removing them. If you can't remove them maybe rummage through the settings on the load balancer to persist connections for an infinite amount of time.
  2. Do you have any debug log traces on the client side? If so, please post them. If not, setup detailed logging with the driver on the client. I'd love to see the output. We should see server disconnect messages inside the client log if this is in fact a network issue.
  3. Try to keep track of the timespan between starting your app and the first disconnect. Do this test multiple times. Is the timespan between app sart and disconnect the same every time? Or does disconnects happen randomly? This should provide us some clues if this is a deterministic timeout somewhere or if this is a more random thing.
  4. Also, I'd like to run through the previous debugging (step 3) scenario without a change feed query. Removing change feeds from the equation, just as a test:
    • Open a connection.
    • Let it sit idle for a time (when you feel somewhat confident it might have failed).
    • Run a simple r.now() time query. Does it work?

Some additional questions:

Lastly, regarding Change Feeds and connection failures: When a connection fails, as a best practice, you should shutdown & close the change feed changes query and re-run your .RunChanges<JObject> query for a new change feed object once the connection has been re-restablished with the server. Basically, you need to re-create the change feed in the event of a failed connection.

The reason for this, IIRC, is that the underlying database change feed (in reality, a database cursor) (server-side) is tied to the underlying TCP connection on the RethinkDB server. So, in the event a connection is dropped (with an open change feed), the database cursor (on the server-side) would be closed and garbage collected. When the client re-establishes a connection at a later time, the server has no recollection of the original change feed because the connection is new. Even if you tired to draw more elements from the change feed cursor on the client-side, the "query id" used to request for more change items would not exist on a new (re-established) connection.

Yikes. Hope that makes sense. :sunglasses: Feel free to let us know what you find. I should be on gitter.im chat for a few hours if you have more questions.

I hope this helped!

Thanks, Brian

:dizzy: :boom: Chaos Chaos - Do You Feel It?

S4RD7R commented 6 years ago

Hi Brian,

Thanks for the reply and all the tips to investigate further. I did a bit of logging and set the system up locally. From what I can tell the Feeds are working fine locally and don't seem to drop. The logging didn't show anything odd either. So my conclusion is the Azure VM as you suspected.

After spending ages digging around there were a couple of comments about KeepAlive setting with Azure and Linux. I think I may need to set these on the VM but haven't managed to work out how to do that in Azure.

net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 8

bchavez commented 6 years ago

Hi @MMJM ,

I'm sorry you're having trouble getting things working. I know it can be very frustrating sometimes. I wish there was a way I could help but Azure issues are a little out of my league.

I hope someone from the community can help you. Maybe post on Twitter about it? Also, maybe ping someone at Microsoft Azure Support on Twitter? Does Azure have some kind of support mechanism you could use to ask questions?

If it's any consolation, I know people are running apps successfully on Azure so it's not totally impossible.

Keep me posted if you find a solution and I'll add it to the Gotchas section for others to see.

If you have any other questions, feel free to ask. I'll keep the issue open for a few days.

Thanks, Brian

:chocolate_bar: :cookie: :lollipop: Ronald Jenkees - Stay Crunchy

S4RD7R commented 6 years ago

Hey no problems. Thanks for your pointers though I think they helped me.

bchavez commented 6 years ago

Hey John,

I hope you were able to solve your issue. I'm going to close the issue now. Feel free to join the Gitter.IM channel if you'd like to chat more.

Thanks, Brian

:evergreen_tree: :crystal_ball: PINES - Fate

S4RD7R commented 6 years ago

Hi Brian,

It isn't yet solved but I do know it is now down to the Idle Timeout settings. I don't yet know how to sort it out but moving the Azure VM setting between 4 and 30 minutes does affect my results. So at least I know what I'm looking for. Thanks for the follow-up.