department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
96 stars 70 forks source link

Monitors should be of /facilities_api/v#/va and /v1/facilities/va #18179

Open eselkin opened 3 months ago

eselkin commented 3 months ago

User Story or Problem Statement

We have monitoring and alerting for /v1/facilities/va and ~but that is a VAOS endpoint not our~ /facilities_api/v#/va and /facilities_api/v#/ccp for #=1 and #=2

We also should add rack_attack rate limiting to vets-api facilities_api/v2 similar to our v1 endpoints

Monitor tagging docs: https://depo-platform-documentation.scrollhelp.site/developer-docs/monitor-tagging-standards Our itportfolio is digital-experience

Description or Additional Context

We used to be under the assumption that /v1/facilities/va and /facilities_api/v1/va were the same endpoint, but they just happened to resolve the same data from lighthouse.

We do own both, as the facilities team.

Related tickets

Original Acceptance Criteria before we understood ownership

New ACs now that we own /v1/facilities/va

eselkin commented 3 months ago

The 500s should probably not get an alert, unless it's really ridiculous. Because we always get a 500 on Vet Center CAPs (because they are not in the LH system)

jilladams commented 2 months ago

So... what had happened was:

  1. We were confused about the usage / ownership of /v#/facilities/va, so we started removing / changing Facilities monitors of that endpoint to instead monitor the endpoint we knew we owned (/facilities_api/v#/va). This included removing monitoring for /v1/facilities/*
  2. In the meantime: last week we figured out that we actually DO own /v#/facilities/va https://dsva.slack.com/archives/C0FQSS30V/p1718747443195809 AND
  3. That the mobile app uses it. Which means: it really needs its own monitoring, for that reason. And we should own those monitors.

Meaning: we need to revise this ticket to cover that situation, and revise what we did / didn't do in Datadog. I've taken a pass at updating the ACs in the ticket body to add a 2nd batch in the revised world of ownership we understand now.

cc @Agile6MSkinner @mmiddaugh @eselkin

jilladams commented 2 months ago

Also as far as the blockers: we discussed the CAP related issues in this thread: https://dsva.slack.com/archives/C05UCL10WH4/p1717608517344489

Tl;dr: Vet centers make API requests related to CAPS, but LH doesn't have CAPS, so returns a 500 error. #9727 is the work for LH to be able to receive CAP data. We'd like to stop making the FE call for CAPs til LH can return them and/or make them work, reflected in #15656.

But in the meantime, we can adjust monitoring alarm limits, which Eli already did: https://dsva.slack.com/archives/C05UCL10WH4/p1717613241041999?thread_ts=1717608517.344489&cid=C05UCL10WH4

So I do not think this ticket is actually blocked by CAP-related things in real life anymore. Removing that status.

jilladams commented 1 month ago

Also noting: we removed a monitor in https://github.com/department-of-veterans-affairs/va.gov-cms/issues/17791 that may have pertained to the Legacy API client, and that we may want to reinstate when we pick up this ticket.