department-of-veterans-affairs / abd-vro

To get Veterans benefits in minutes, VRO software uses health evidence data to help fast track disability claims.
Other
19 stars 6 forks source link

Consolidate all on-call data and instructions in one location #3519

Open meganhicks opened 1 month ago

meganhicks commented 1 month ago

As the VRO team responsible for on-call duties, it is crucial to consolidate all relevant information in one place. This will ensure that onboarding new teammates to this task is as seamless and straightforward as possible

Previous AC

  1. Review both the internal and external wikis and consolidate all on-call related information into a single page.
  2. Organize the content in a logical manner based on on-call priorities.
  3. Once completed, post a link to the consolidated page and share with the team in the engineering channel

New AC:

Review the list posted by lisa below and finish the effort

lisac commented 1 month ago

As part of AC 1, here's an inventory of where we talk about on-call.

There exists a single page overview titled On Call Responsibilities. Let's call that the on-call overview wiki page (OCOWP).

There are additional pages that describe expectations of the on-call engineer. Many of those are linked to this OCOWP. Some are not. The Slack workflows are not linked to the OCOWP, although some of the slack workflows are referenced from wiki pages that ARE linked from the OCOWP.

id page linked from OCOWP ?
1 wiki: On Call Responsibilities (aka the OCOWP) n/a
2 wiki: Incident response Yes
3 wiki: VRO Deployment Policy Yes
4 wiki: Dependabot --> on-call responsibility Yes
5 wiki: Metrics (eg capturing MTTR) No
6 wiki: SecRel Getting Started No
note: this is in the private repo
7 wiki: Post-Incident reviews No
touched on in Incident Response: Step 6
note: this is in the private repo
8 slack workflow Incident Report No
linked from Incident Response: Catalyst
9 slack workflow Partner Team Production Deployment No
10 slack workflow Opt-Out Production Deployment No
11 Recurring GH issue for On-call, eg #3384 #3439 #3499 No
lisac commented 1 month ago

did not complete. removing my assignment and moving ~back to Backlog~ to Sprint Ready.

bianca-rivera commented 1 month ago

Note: pair with @bianca-rivera when picking up this ticket to make sure documentation for Incident Response and Deployment workflows are updated or linked accordingly

gabezurita commented 1 week ago

@meganhicks and @bianca-rivera, this still seems worth doing as we'll be on-call for a few more sprints. Here's the On-call overview doc for us to use: https://github.com/department-of-veterans-affairs/abd-vro/wiki/VRO-On%E2%80%90Call-Overview