Open jimleroyer opened 1 year ago
As an ops lead of GCNotify, I want to know when there are DNS resolution issues, So that I can be proactive on failing components of the stack.
We want to know of any DNS issues we might run into early, as these tend to propagate higher in the stack.
An alarm that detects a certain threshold of unresolved DNS requests.
Increased observability and proactive steps.
This is an action item following the incident 2023-04-07-could-not-translate-read-only-db-proxy-host-name.
Will require investigation: how do (can?) we do this?
Can possibly scrape the k8s (or other?) logs to find DNS problems?
Description
As an ops lead of GCNotify, I want to know when there are DNS resolution issues, So that I can be proactive on failing components of the stack.
WHY are we building?
We want to know of any DNS issues we might run into early, as these tend to propagate higher in the stack.
WHAT are we building?
An alarm that detects a certain threshold of unresolved DNS requests.
VALUE created by our solution
Increased observability and proactive steps.
Acceptance Criteria
QA Steps
Info
This is an action item following the incident 2023-04-07-could-not-translate-read-only-db-proxy-host-name.