department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
98 stars 68 forks source link

Enable a notification to warn users that login may be affected by vets-api latency event #10772

Closed jilladams closed 2 years ago

jilladams commented 2 years ago

Description

https://dsva.slack.com/archives/C52CL1PKQ/p1663699080748009

We are currently experiencing issues with login due to vets-api latency issues & PW owns the modal (we believe). We need a mechanism to notify users of the outage -- e.g. a "we are experience issues with sign in" message on flipper or something to that effect that we can trigger due to the outage.

Alert language here: https://design.va.gov/content-style-guide/error-messages/system#unscheduled-downtime-notifications is an available alert but we don't know how to trigger.

Acceptance Criteria

CMS Team

Please check the team(s) that will do this work.

jilladams commented 2 years ago

From slack thread: The status check (backend_statuses endpoint that returns status on services) is already integrated. So when a service reports that it's down, we believe the downtime notification would already appear, without any additional work. However: that endpoint also appears to be dependent on the Lighthouse api. This backend_statuses response is slow, and when it does return, it's reporting that all the login services are active.

So: the sniff to understand if something is down is a) not 100% reliably responding right now (not awesome) and b) when it is responding, reporting that things are fine.

jilladams commented 2 years ago

A feature flag: won't work What we explored:

We could push some sort of notice behind a flipper to work around this that says: we're having issues. We can tie it to the backend_statuses logic and say: if be_statuses says a service is down OR if this flipper is enabled, then show this alert. That means we would reuse the existing alert language / presentation.

  1. Update the logic for displaying the alert based on backend_statuses, to also display if a flipper is enabled - ~1 hours
  2. Write a new flipper in vets-api - 1-2 hours
  3. Push, merge & deploy the flipper in vets-api - ?we don't know how long
  4. Push, merge, deploy the logic
  5. Test in staging w/ flipper enabled
  6. Enable in production

This would mean 1 flag specifically for the ID.me service. If we want to do this sort of thing for other services, we would need to create additional flippers.

Except: flippers aren't reliably working right now, also due to the vets-api latency issue. 🙃

jilladams commented 2 years ago

Opted to go the route of publishing a Full width alert.

Related threads: https://dsva.slack.com/archives/C52CL1PKQ/p1663699080748009 https://dsva.slack.com/archives/CBU0KDSB1/p1663606551806909 https://dsva.slack.com/archives/CDHBKAL9W/p1663705417634809