Open jhouse-solvd opened 2 years ago
Existing oncall rotation doc that contains links to incident categorization, incident command info, incident response playbook
https://github.com/department-of-veterans-affairs/va.gov-team-sensitive/tree/master/OnCall
One of the first things that we need to do is to understand what exists already.
We can refer to the documentation posted above and try to wrap our understanding around existing processes and information.
It could be worth it to workshop incident categorization and prioritization based on actual incidents that have occurred over the past couple of years.
@jhouse-solvd Which of the following should be covered by the Incident Plan ? https://docs.google.com/spreadsheets/d/1Fn2lD419WE3sTZJtN2Ensrjqaz0jH3WvLaBtn812Wjo
A couple of questions:
Problem Statement
The incident response process is unclear. It is unclear when to treat an issue as an incident. And, it is unclear how to classify, prioritize, and respond to incidents when they occur.
Background / Context
How might we
...update the incident response process so that it's clear for end-users? ...update the incident response process so that it's clear for SRE personnel? ...update the incident response process with guidance for classification and prioritization? ...update the incident process with communication protocol and responsibilities?
Hypothesis or Bet
This will make it easier for VFS teams to declare an incident. This will make it easier for SRE (and platform teams) to respond to incidents.
We will know we're done when... ("Definition of Done")
When there is an updated incident response process in place.
Known Blockers/Dependencies
List any blockers or dependencies for this work to be completed
Projected Launch Date
TBD
Launch Checklist
Is this service / tool / feature...
... tested?
... documented?
... measurable
When you're ready to launch...
Required Artifacts
Documentation
PRODUCT_NAME
: directory name used for your product documentationTesting
Measurement