StackStorm / hubot-stackstorm

Hubot plugin for integration with StackStorm event-driven infrastructure automation platform.
Apache License 2.0
49 stars 39 forks source link

Log errors and exit on authentication issues on Hubot plugin initialization #168

Closed jinpingh closed 5 years ago

jinpingh commented 5 years ago

Fixes https://github.com/StackStorm/st2chatops/issues/124

  1. Capture all authentication related exceptions and exit from st2chatops.
  2. Create new callback function for robot.error to capture uncaught error from hubot.
  3. Add unit tests for authentication error.

Closes stackstorm/st2tests#170

blag commented 5 years ago

If we talk about the failure which could be potentially recovered by connection retries/or retries with backoff, - we can consider using that instead.

We have this behavior between other microservices of ST2, so it would be nice to have the same behavior in st2chatops. But code-wise, it would be frustrating to keep the semantics of "retry with backoff" in st2chatops up-to-date with how it works in the rest of ST2.

But if nothing helped, - always exit with error message and non-zero exit code.

This is the easiest thing for us to do. At this point, I think this behavior is a good compromise between trying to behave like the other ST2 services and increased complexity of this codebase/PR.

blag commented 5 years ago

Should close #77 if st2chatops gets automatically restarted, which I think it will in a kubernetes environment.

arm4b commented 5 years ago

Thanks @jinpingh and @blag for jumping in! I'll test the packages from this PR in K8s/Docker environment once again to verify it and will get back to you.

blag commented 5 years ago

@armab

we significantly changed the logic here

Yep.

and overengineered it

I disagree. The part of this PR that is fixing the issue is the proper way to implement this change according to hubot's documentation on error handling. The code churn that has happened as a result is necessary to ensure that places where connection issues could happen will all cause hubot to shutdown. Additionally, we only wanted to have one way to gracefully shutdown instead of spaghetti code shutting it down in multiple places.

removing existing self-healing mechanisms

Yep. In a k8s environment, gracefully exiting and relying on k8s or the process manager to restart the process from scratch is a self-healing mechanism. ChatOps is not part of the rest of ST2 microservices, and it has to re-authenticate when it starts up.

to re-load and recover hubot with ST2_COMMANDS_RELOAD_INTERVAL setting.

As long as the connection to StackStorm is fine, this behavior is unchanged: st2chatops will continue to reload commands on a configurable interval. If it cannot connect to StackStorm, the process will die, hopefully be respawned, and the new instance will connect again.

The other option, which is what we originally had, was to make the shutdown/respawn/reconnect behavior configurable via an environment variable, but you didn't like that solution either, and that's why we refactored it to this behavior.

I really meant to exit app on unrecoverable failures (when reload stopped to work and hubot was just hanging in noop) by adding some kind of handler for it.

Yes, that's precisely what this PR implements, in a modern, proper manner, following Hubot documentation, including tests, and a minimum amount of refactoring to ensure that errors are not swallowed and that they do propagate up to the shutdown-on-error logic.

arm4b commented 5 years ago

If it cannot connect to StackStorm, the process will die, hopefully be respawned, and the new instance will connect again.

Previously, if hubot couldn't connect to st2 the process would continue to re-connect in a loop. That's the correct behavior as networking can go up/down. This PR changes the original behavior. By adding logErrorAndExit in places where I see them could lead to cases when it's wasn't ideal to log & exit, eliminating existing re-load capabilities that happens on ST2_COMMANDS_RELOAD_INTERVAL.

blag commented 5 years ago

@armab Sorry, I wasn't clear on what you wanted earlier. I have fixed the code where you have pointed it out. Please re-review.

This code needs to handle reconnecting better, but that is outside the scope of this PR.

jinpingh commented 5 years ago

Did run test case posted on StackStorm/st2chatops#124 (comment). With yesterday fix, for invalid ST2_API_URL , ST2_STREAM_URL, ST2_AUTH_TOKEN and ST2_API_KEY , st2chatops logs out error message and keep waiting. And error message for ST2_STREAM_URL, ST2_AUTH_TOKEN andST2_API_KEY` are the same. Is this the result that we wanted?

arm4b commented 5 years ago

@jinpingh To understand exactly what do you mean, can you please paste the commands to reproduce? I'm also looking at the test cases in this PR which helped everyone to be on the same page.

Also, "keep waiting" turned out to have diffs. In reported https://github.com/StackStorm/st2chatops/issues/124 the original issue was hanging our process in unrecoverable and waiting state. If reload loop happens (every 2 mins), - it's a different state.

jinpingh commented 5 years ago

@jinpingh To understand exactly what do you mean, can you please paste the commands to reproduce? I'm also looking at the test cases in this PR which helped everyone to be on the same page.

Also, "keep waiting" turned out to have diffs. In reported StackStorm/st2chatops#124 the original issue was hanging our process in unrecoverable and waiting state. If reload loop happens (every 2 mins), - it's a different state.

jinpingh commented 5 years ago

@jinpingh To understand exactly what do you mean, can you please paste the commands to reproduce? I'm also looking at the test cases in this PR which helped everyone to be on the same page.

@armab The way to run testcases is to modify /opt/stackstorm/chatops/st2chatops.env file for StackStorm settings, then run /opt/stackstorm/chatops/bin/hubot

I think at least st2chatops should be exit for invalid ST2 Configurations. If ST2Stream up and down by other reason, then st2chatops should try the best effort to recovery not just quit.

blag commented 5 years ago

@armab Sorry to keep pulling you back to this, but I figured out how to test this all, and I think it's finally in a mergeable state!

blag commented 5 years ago

Coverage from current master:

------------------------|----------|----------|----------|----------|-------------------|
File                    |  % Stmts | % Branch |  % Funcs |  % Lines | Uncovered Line #s |
------------------------|----------|----------|----------|----------|-------------------|
All files               |    82.73 |    79.59 |    83.76 |    82.24 |                   |
 lib                    |    96.88 |    96.68 |    97.75 |    96.75 |                   |
  command_factory.js    |      100 |      100 |      100 |      100 |                   |
  format_command.js     |      100 |      100 |      100 |      100 |                   |
  format_data.js        |      100 |      100 |      100 |      100 |                   |
  post_data.js          |      100 |      100 |      100 |      100 |                   |
  slack-messages.js     |    88.89 |    81.58 |    91.67 |    87.64 |... 34,35,37,38,43 |
  slack_monkey_patch.js |    28.57 |       80 |       50 |    28.57 |    17,18,19,28,29 |
  utils.js              |      100 |      100 |      100 |      100 |                   |
 scripts                |    41.81 |    37.11 |    39.29 |    41.81 |                   |
  stackstorm.js         |    41.81 |    37.11 |    39.29 |    41.81 |... 04,405,406,431 |
------------------------|----------|----------|----------|----------|-------------------|

Coverage from this branch:

------------------------|----------|----------|----------|----------|-------------------|
File                    |  % Stmts | % Branch |  % Funcs |  % Lines | Uncovered Line #s |
------------------------|----------|----------|----------|----------|-------------------|
All files               |     87.9 |    86.34 |    86.67 |    87.56 |                   |
 lib                    |       99 |    99.57 |    98.86 |    98.96 |                   |
  command_factory.js    |      100 |      100 |      100 |      100 |                   |
  format_command.js     |      100 |      100 |      100 |      100 |                   |
  format_data.js        |      100 |      100 |      100 |      100 |                   |
  post_data.js          |      100 |      100 |      100 |      100 |                   |
  slack-messages.js     |      100 |      100 |      100 |      100 |                   |
  slack_monkey_patch.js |    28.57 |       80 |       50 |    28.57 |    17,18,19,40,41 |
  utils.js              |      100 |      100 |      100 |      100 |                   |
 scripts                |    59.28 |    58.56 |    53.13 |    59.28 |                   |
  stackstorm.js         |    59.28 |    58.56 |    53.13 |    59.28 |... 96,425,461,462 |
------------------------|----------|----------|----------|----------|-------------------|

👍