department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
282 stars 200 forks source link

PgHero cleanup #80648

Closed rmtolmach closed 2 months ago

rmtolmach commented 5 months ago

Issue Description

PgHero (http://pghero-prod.vfs.va.gov/) is reporting some slow queries and, occasionally a long-running query. image

Some of the slowest and/or most frequently called queries should be addressed. We will want to report significant findings to the teams that own the related code.

Tasks

Success Metrics

The info in PgHero has been analyzed and acted upon.

Acceptance Criteria

Validation

Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.

RachalCassity commented 4 months ago
Screenshot 2024-05-02 at 10.57.46 AM.png
jennb33 commented 3 months ago

This might involve the VFS teams as well!

rmtolmach commented 3 months ago

Invalid Indexes

These indexes exist, but can’t be used. You should recreate them.

  1. index_flipper_features_on_key added in https://github.com/department-of-veterans-affairs/vets-api/pull/15656
  2. index_flipper_gates_on_feature_key_and_key_and_value also added in https://github.com/department-of-veterans-affairs/vets-api/pull/15656
  3. index_ivc_champva_forms_on_form_uuid added in https://github.com/department-of-veterans-affairs/vets-api/pull/16721
rmtolmach commented 3 months ago

Duplicate Indexes

These indexes exist, but aren’t needed. Remove them for faster writes.

  1. On accreditations
    index_accreditations_on_accredited_individual_id (accredited_individual_id) is covered by index_accreditations_on_indi_and_org_ids (accredited_individual_id, accredited_organization_id)

  2. On async_transactions
    index_async_transactions_on_transaction_id (transaction_id) is covered by index_async_transactions_on_transaction_id_and_source (transaction_id, source)

  3. On va_notify_in_progress_reminders_sent index_va_notify_in_progress_reminders_sent_on_user_account_id (user_account_id) is covered by index_in_progress_reminders_sent_user_account_form_id (user_account_id, form_id)

These are all composite indexes. Since the composite index can serve the same queries as the single-column index, the single-column index is redundant and can be dropped. First, we can check some usage stats to see how often the individual index is used. If it's low, or unused, we can drop it.

rmtolmach commented 3 months ago

Suggested Indexes

Add indexes to speed up queries.

  1. CREATE INDEX CONCURRENTLY ON in_progress_forms (form_id, created_at)
  2. CREATE INDEX CONCURRENTLY ON saved_claims (type, id)

We could generate a new migration for these two (or have the owning VFS team do it). Based on the stats in PgHero, we wouldn't save that much time, but maybe it's still worth it?

edit: I refreshed the pghero page and this warning was gone. There were no suggested indexes.

rmtolmach commented 3 months ago

Slow Queries

Slow queries take 20 ms or more on average and have been called at least 100 times.

There are 8 of these currently. The biggest offender is a call to the vba_documents_upload_submissions table which takes an average of 3 seconds! This call takes place in the vba_documents module in upload_remover.rb.

rmtolmach commented 2 months ago

High Number of Connections

1124 connections Use connection pooling for better performance. PgBouncer is a solid option.

Recently, we went from 6 workers and 3 threads per worker to 6 workers and 8 threads per worker (so 48 available per pod). This could have been a cause for the high number of connections, but based on the fact that this number is similar to the number pasted in the screenshot of the description, I'm guessing the change had no impact and this is just an issue we've had for a long time.

❓ How many free database connections do we have? The number of connections depends on the RDS size. Currently, our RDS can handle 2000, so it's fine.

rmtolmach commented 2 months ago

Long Running Queries

rmtolmach commented 2 months ago

End of week update:

jennb33 commented 2 months ago

This ticket has been duped for Sprint 5 as 87373, at 5 story points, in case the work is not completed in Sprint 4.

rjohnson2011 commented 2 months ago

Merged this PR which resolves high number of connections error threshold issue Rebecca noted above -> https://github.com/department-of-veterans-affairs/devops/pull/14501

jennb33 commented 2 months ago

Closing this ticket for Sprint 4, any additional work can be completed in Sprint 5, when @rmtolmach is back in office. TY @rjohnson2011 !

rjohnson2011 commented 2 months ago

Merged and closed this PR which is bringing down the slow query by .8ms ->https://github.com/department-of-veterans-affairs/vets-api/pull/17377

Logs/Details in the PR