Netflix / dispatch

All of the ad-hoc things you're doing to manage incidents today, done for you, and much more!
Apache License 2.0
4.95k stars 488 forks source link

enhancement(cli/consume): improve thread exception handling in cli consumer #5118

Closed wssheldon closed 3 weeks ago

wssheldon commented 3 weeks ago

Problem

Consumer threads did not get monitored for zombie state and re-initialized. General exception handling around the consumer was not performed and exceptions appear to be swallowed.

Solution

  1. Implements exception handling to prevent thread crashes due to unexpected errors.
  2. Continuously monitors all threads and automatically restarts any that have died.
  3. Improves logging for better visibility into the process's status and any errors encountered.

Error State

Traceback (most recent call last):
...

botocore.exceptions.ClientError: An error occurred (InvalidClientTokenId) when calling the GetQueueUrl operation: The security token included in the request is invalid
DEBUG:https://sqs.us-west-2.amazonaws.com:443 "POST / HTTP/1.1" 403 310:/Users/wshel/.pyenv/versions/3.11.2/envs/dispatch/lib/python3.11/site-packages/urllib3/connectionpool.py:_make_request:547
ERROR:Exception in thread for plugin signal-consumer: An error occurred (InvalidClientTokenId) when calling the GetQueueUrl operation: The security token included in the request is invalid:/Users/wshel/Projects/dispatch/src/dispatch/cli.py:_run_consume_with_exception_handling:827