We have enabled uptime checks in Google Cloud console. We should consider whether to:
Send notifications to PagerDuty.
How to modulate the notifications given that the larger hubs may take around 10m to come back online after redeployments. We can pause them during CI, set outage windows to be longer than expected CI outages, or possibly choose an endpoint which is up even when the hubs are still coming back up.
If we disable notifications during CI, we'd need to make sure they were always enabled afterwards, no matter what the outcome of the deployment. Perhaps something of the form below in our CI's configuration at jobs.deploy.steps:
@ryanlovett Thanks for creating this user story! I will add this for discussion during our next sprint planning meeting to hash out our implementation plans.
Summary
We have enabled uptime checks in Google Cloud console. We should consider whether to:
The notifications can be modified via CLI, e.g.
gcloud alpha monitoring policies update --enabled POLICY_NUMBER
gcloud alpha monitoring policies update --no-enabled POLICY_NUMBER
If we disable notifications during CI, we'd need to make sure they were always enabled afterwards, no matter what the outcome of the deployment. Perhaps something of the form below in our CI's configuration at jobs.deploy.steps:
Alternatively use shell return values and boolean logic to sandwich
gcloud
around calls tohubploy
.User Stories
We'd like to be notified if some services are down. This would displace other notifications such as uptime robot.
Acceptance criteria
Important information
Tasks to complete