department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
282 stars 203 forks source link

Investigate PG Query Duration Spikes for VBA job #95678

Open LindseySaari opened 1 day ago

LindseySaari commented 1 day ago

Description:

As part of the actionable alerts investigation, we identified query duration spikes in Postgres during the 2:30-4:00 AM ET window (insert link to slack conversation here) via our Datadog monitor. The Appeals team owns the related job. We discovered PG query statement timeouts in the logs, but after investigating theories around locking (autovacuum or DB cleanup tasks, etc), this does not appear to be the cause. RDS logs also don’t reveal issues, and the jobs run fine throughout the day. Adjustments were made to job intervals (link to PRs). We need to work closely with the Appeals team and monitor the situation closely.

Acceptance Criteria:

LindseySaari commented 9 hours ago

The PG Query duration spiked yet again last night. After some investigation around job run times, here are some findings.

VBA job runtime notes

VBADocuments::UploadScanner - Every 3 minutes VBADocuments::UploadRemover - Every 5 minutes

Other jobs that run in that window

EVSS::DeleteOldClaims – 2:00 AM DeleteOldPiiLogsJob – 2:20 AM VBADocuments::UploadScanner – Every 3 minutes VBADocuments::UploadRemover – Every 5 minutes DecisionReview::FailureNotificationEmailJob – 1:05 AM Form526StatusPollingJob – 3:00 AM DeleteOldTransactionsJob – 3:00 AM Representatives::QueueUpdates – 3:00 AM

Jobs with Error Logs

Investigation Notes

Looking at the DeleteOldPiiLogsJob job, I wonder if this is at play... I did a .count on that table right now and it returned 687,000 records. There also is an index on that created at column. The deletion could be taking longer to update that index also... I wonder if the deletion should be batched maybe? This could cause table locking but that wouldn't relate to those VBA jobs.