Closed lap1817 closed 6 years ago
specs that explain the expected behavior would be more clear than the long PR description. i would like spec cases that show what happens when a connection is totally lost and when it's briefly interrupted. the specs would set up a ZK object that returns the expected values/raise the expected exceptions, and then show that nerve behaves correctly under those conditions
@igor47 , rspec tests are added. PTAL
@igor47 @jolynch: please take a look. rspec tests have been updated to test the new behavior of zookeeper reporter
@igor47 @jolynch : can you take a look at this one. I have added new rspec tests to cover this change
LGTM just make sure to address the typo and check the two other comments
@igor47, I accidentally deleted my repo. Here is the re-created PR: https://github.com/airbnb/nerve/pull/99
I have addressed your comments on the tests. It also also some new change, which allows the connection errors to bubble up when it happened too many
This PR is to follow the similar idea in the PR on Synapse (https://github.com/airbnb/synapse/pull/250).
The current logic in Nerve is that: if an error is raised from the reporter (e.g. zookeeper reporter) when ping?, report_up, report_down, the watcher thread will stop the reporter (so it tries to remove the znode). Then the Nerve main thread will reap the watcher and relaunch it.
This caused unnecessary Nerve status flipping while in short network disconnection. The ping? function on reporter is most frequently called by the watcher thread. E.g.,
Flipping nerve status triggers more Synapse read, which make the network saturation more likely to happen.
The fix is to let the reporter to swallow the error if it is connection error. The reporter is mostly controlled by the watcher thread to report up/down status. Thus:
Reviewers: @alexism @jolynch @darnaut