HHS / simpler-grants-gov

https://simpler.grants.gov
Other
41 stars 11 forks source link

Spike: Investigate High Database Utilization and Query Speed #1833

Open acouch opened 5 months ago

acouch commented 5 months ago

Summary

The development database currently shows a high CPU utilization, despite the #1805 . The query speed for the updated Search API is also suboptimal.

This task is to investigate and recommend better settings in AWS and take a look at the query speed and determine if there is a scaling or caching improvement that would be an easy fix, that is not related to the queries themselves. In otherwords improving the queries or adding indexes is outside of the scope of this task.

Acceptance criteria

bretthrosenblatt commented 5 months ago

There's not much I can look at with regards to database health in a dev environment. Stats like waits, buffer hits etc. are only meaningful under load. If you only have 2-3 sessions running and are seeing performance issues it's either query or scaling related. You have a serverless environment, so if the db is idle the first query would be slow, but since you're loading opportunity data on a schedule the db will never be idle so that's not a likely cause. Kai raised the ACU (cores) max to 16, which is big enough for the data you're using. You can scale much larger to see the effect, but query issues are usually just query issues. If you want high-performing queries I'd recommend using stored procedures, eliminating all joins, counts, distincts and subqueries. This issue will be significantly magnified in a prod environment where thousands of queries are in contention for the same resources.

acouch commented 5 months ago

@bretthrosenblatt examples of load could include a load test we setup, or running some of the transformation steps that @chouinar is working on.