department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 201 forks source link

ITF Failure Dashboard cleanup #78308

Open tblackwe opened 6 months ago

tblackwe commented 6 months ago

When the ITF service is down or responds with an error the Veteran is unable to proceed with creating a 526 claim.

To determine how frequently this issue occurs, please review the existing ITF dashboard to ensure they are accurately capturing errors

The dashboard has v0 which is the vets-api controller. v2 is the lighthouse api.

AC:

sortizsh commented 6 months ago

Thomas can you validate that this cleanup is still needed?

freeheeling commented 5 months ago

Dashboards for reference:

Questions:

  1. Why so many 404s?
  2. Why don't the widgets have data to display?

Thomas to confirm expectations.

emilytheis commented 5 months ago
  1. 404s are a normal/expected response for a GET that fails. However, if POSTs are failing, we should investigate that.
va-albers commented 5 months ago

@freeheeling regarding "Why don't the widgets have data to display?" - I would start by checking if the service name for the controller being queried needs to be changed/removed. That could have resulted in empty fields. For instance I removed the service:vets-api parts of the queries on Intent To File - Overview all all the widgets are showing data now.

va-albers commented 5 months ago

(@freeheeling FYI I updated the 4 dashboards above removing the service:vets-api reference on some widgets - all widgets should show up now)

tblackwe commented 5 months ago

Scheduling meeting with Mark Chae to discuss

freeheeling commented 5 months ago

Thanks, @va-albers. Appears I don't have Dashboards Write permission, but I'm also not clear on where to modify the service name being queried.

freeheeling commented 5 months ago

Responding to Emily's comment from 5/7, on the Benefits - ITF Success/Timeout/Error rates dashboard, there do appear to be 404 status codes from POST requests. However, when viewing the related logs, at least for the few I inspected, the requests are all GET. So, perhaps, the dashboard widget is labeled incorrectly. I'm still a Datadog novice with limited permissions, fumbling around without much guidance.

Screenshot 2024-05-09 at 4.17.40 PM.png Screenshot 2024-05-09 at 4.20.37 PM.png
va-albers commented 5 months ago

Thanks, @va-albers. Appears I don't have Dashboards Write permission, but I'm also not clear on where to modify the service name being queried.

I will check this

lisacapaccioli commented 3 months ago

@va-albers It's been a while since this ticket has been viewed and I am not sure what to do with it. Is this something you can help with? (cc @tblackwe)