Netflix / mantis

A platform that makes it easy for developers to build realtime, cost-effective, operations-focused applications
Apache License 2.0
1.42k stars 202 forks source link

DynamoDBMasterMonitor: Don't publish MASTER_NULL on Dynamo failure #696

Closed crioux-stripe closed 4 months ago

crioux-stripe commented 4 months ago

Context

We encountered an issue in which DynamoDBMasterMonitor would emit a MASTER_NULL value in the event of failing to read or decode the lock from DynamoDB. This caused some minor but unncessary churn as our agents believed the master had changed when it in fact had not. This can be further exacerbated by transient failures in Dynamo or throttling when querying Dynamo.

Checklist

github-actions[bot] commented 4 months ago

Test Results

535 tests  ±0   529 :white_check_mark: ±0   7m 51s :stopwatch: +8s 139 suites ±0     6 :zzz: ±0  139 files   ±0     0 :x: ±0 

Results for commit b3bba8c6. ± Comparison against base commit a61cf1dd.

:recycle: This comment has been updated with latest results.

crioux-stripe commented 4 months ago

Just realized I have a commented out test here. Working on a fix.

crioux-stripe commented 4 months ago

Fixed.