Closed WadeBarnes closed 1 year ago
Based on the call with Dustin today there is still work to be done on this ticket. Also, please document some of the findings that have been shared.
Based on Call with Dustin on Friday (Dec 9th, 2022) we found out that downtime alert is not a good way to determine when any pod is down as that alert is producing results over 100% of the entire timeline instead of just 3 minutes.
Dustin is going to help us setting up a new alert which will show when any pod is down for more than a minute
All the unwanted Sysdig alerts are dis-abled now and new alerts are created instead. We are no longer receiving the overwhelming majority of the downtime alerts we receive on our dts-sysdig-alerts channel in Rocket.Chat.
The overwhelming majority of the downtime alerts we receive on our
dts-sysdig-alerts
channel in Rocket.Chat are due to a minute or so of missing data for the monitored containers.Please investigate.
1) We need to understand and document why this occurs. I believe there is an explanation we've received previously for why this occurs. 2) Understand and document whether there is anything that can be done to resolve the issue. For example: