Create and document incident management process

It would be useful to have a form of checklist that people can follow to help determine roles during the early stages of an incident, along with the first steps and priorities that each role should consider. For example:

Incident manager
- Designate workstreams and owners
- Figure out shared documentation/communication/coordination strategies and advertise it
- Figure out external communication (who are users impacted, etc.), communication timeline, etc.
DRI (directly responsible individual)
- Primary owner of incident investigation and follow-up
- Focus on mitigation, not root cause
- Determine impact

Additionally, we need better centralized documentation on how to handle escalations to various dependencies, e.g. internal common engineering services (pipeline agents, container registries, azure services), github, package management hosts, etc.

Azure / azure-sdk-tools

Create and document incident management process #3474