StackStorm / hubot-stackstorm

Hubot plugin for integration with StackStorm event-driven infrastructure automation platform.
Apache License 2.0
49 stars 39 forks source link

Exit hubot on unhandled promise rejections and other unrecoverable errors #172

Closed arm4b closed 5 years ago

arm4b commented 5 years ago

Addresses some of the cases StackStorm/st2chatops#124 when hubot was hanging on error with no possible recovery.

In the above error seems the important part is reading an error message & research about it:

[Thu Apr 04 2019 17:03:58 GMT+0000 (Coordinated Universal Time)] ERROR Failed to authenticate: getaddrinfo ENOTFOUND non-existent non-existent:9100
(node:24) UnhandledPromiseRejectionWarning: Error: getaddrinfo ENOTFOUND non-existent non-existent:9100
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:56:26)

(node:24) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:24) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

when promise reporting an error usually happens in some unrecoverable app state.

Adding a global unhandledRejection handler that will catch it and exit the bot on error instead of waiting in noop state will solve this and similar future problems when any external library could be spot in unrecoverable errors.

Resources

Related #168

blag commented 5 years ago

NAK. Hubot already has a way to handle this: https://hubot.github.com/docs/scripting/#error-handling

I especially don't like this:

setTimeout(process.exit.bind(process, 1), 1000)

Don't schedule process.exit to be called, just call it.

This is handled properly in #168. Specifically:

The logErrorAndExit callback function Registering logErrorAndExit as a callback with hubot

We also call that function directly when we detect situations that will trigger errors, eg: ST2_AUTH_USERNAME, ST2_AUTH_PASSWORD, and/or ST2_AUTH_URL aren't specified as a group.

Additionally, #168 refactors a lot of the code to be more concise, follow updated conventions, and comes with tests specifically for some of the situation that could throw errors.

Besides #168, the only other work that might need to be done regarding the "perpetual waiting" issue is to create integration tests (eg: with BATS, similar to st2tests/chatops/test_hubot.bats) that trigger specific error conditions that cannot be tested with the current tests in hubot-stackstorm.

Closing in favor of #168, but if you disagree feel free to reopen and discuss.

arm4b commented 5 years ago

@blag With https://hubot.github.com/docs/scripting/#error-handling you can handle uncaught exceptions on hubot side which would make sense in other situation or even be more cleaner solution in PR like #168. You can look at https://github.com/slackapi/hubot-slack/pull/112/files why having setTimeout might be helpful to allow other tasks happening in background to finish.

Handling unhandledRejection is a different story and was exactly the case of issue happening in https://github.com/StackStorm/st2chatops/issues/124. I could able to confirm the fix with the code applied. Please take a look at the original error message and https://nodejs.org/api/process.html#process_event_unhandledrejection. This is not about catching an exception with robot.error which I researched too, thanks to pointers in other PR.

With the approach you guys are following in #169, - it removes some logic around the reloads that original hubot-stackstorm already had in place for self-healing capabilities. It may affect much more than we think.

Instead of always exit on every possible error, hubot to exit on any 100% unrecoverable issue (like this one) and our job is to catch different auth errors in a more granular way rather then logErrorAndExit on any possible corner case, killing the original reload loop.

blag commented 5 years ago

I took another look at this, and I think it is complementary to #168. Let Hubot catch errors, and let this catch unhandled promise rejections.

I still don't like how it exits using setTimeout(), since that seems like too much of a brittle hack. The "wait one second and then hope everything is fine when we pull the plug" method isn't as precise as I think we can and should be. Instead, we should attempt to gracefully shutdown everything that might be generating events before we pull the plug. So instead of calling setTimeout(process.exit.bind(process, 1), 1000), you should simply be able to call stop({shutdown: true}) from #168.

@armab What do you think about that?

Sorry for not trusting you. I'm still learning advanced Javascript, but I should have given this more thought before closing it.

arm4b commented 5 years ago

@blag Sounds good. If we stick with https://github.com/StackStorm/hubot-stackstorm/pull/168#discussion_r291263761 approach, I would better just throw original error from there that will be caught by robot.error handler you added in #168 and that'll exit hubot in a more graceful way.

This would avoid bad practices of calling https://nodejs.org/api/process.html#process_process_exit_code and loosing any unfinished async calls.

arm4b commented 5 years ago

Just included uncaught exceptions handler in this PR which complements handing unhandledRejection events. I've also removed process.exit and setTimeout per previous feedback. Note that it's not full solution for all the issues we spot before, however fixes some of them related to unhandled/uncaught errors.

@blag Please take a look.

blag commented 5 years ago

One small change to make but looks good to me.

arm4b commented 5 years ago

@blag Thanks for review!