GSA-TTS / FAC

GSA's Federal Audit Clearinghouse
Other
19 stars 5 forks source link

Optimize tables for search #3478

Closed jadudm closed 6 months ago

jadudm commented 7 months ago

Our search is very non-performant.

This is because multiple reasons:

  1. all searches of interest require joins across multiple (large) tables
  2. the dissemination tables were never designed (for search).
### Tasks
- [x] Add one or more materialized views to support a performant search interface
- [x] Create Django command to manage the creation/deletion/regeneration of the materialized view
- [x] Add an unmanaged Django model(s) to match the materialized view(s)
- [x] Generate summary report from the materialized view instead of re-running the query against separate tables
- [ ] Verify that search against the materialized view returns the correct data
- [x] Add a GH Action to call `fac materialized_views --refresh` on a periodic schedule
- [x] Update `.profile`

Proposed: MATERIALIZED VIEW

We had discussed various strategies to cache/re-format tables that would be more performant.

Postgres has a notion of materialized views. A materialized view is... a view, but it is populated with data, and not dynamic at query-time. Think of it as a table, expressed by a query, that gets populated. As a result, it can be used for searches without incurring additional overhead.

Materialized views have a cost to create (in time and space), but once created, behave like other tables. They do not stay up to date with their "linked" tables, however. As a result, they must be periodically refreshed.

Solution sketch

https://github.com/GSA-TTS/FAC/tree/jadudm/materialized-views

This adds

This will drop (if exists) and create (if not exists) a materialized view called dissemination_combined that combines general, federalaward, and finding rows. It only combines data where there are findings. So, this table is only of use if someone is searching for audits that have findings.

use

fac materialized_views --drop

fac materialized_views --create

fac materialized_views --refresh

The third command could be triggered via GH Action periodically (daily? 2x daily? 4x daily?) to update the view.

danswick commented 7 months ago

For now, @sambodeme's goal will be a branch where we can run a particularly taxing query against an unmanaged materialized view. @sambodeme, could you drop the query in this thread?

jadudm commented 7 months ago

Also, please feel free to rewrite/modify/etc. this ticket as needed. It... I don't know how I should have documented the spike/thinking differently.

sambodeme commented 7 months ago

Here is the query params: Date range: 10/1/2022-10/31/2022 ALN: 20 Findings: Any

jadudm commented 7 months ago

From a question to the dev channel, the idea that we can use LEFT JOIN to construct the MATERIALIZED VIEW is very productive.

danswick commented 6 months ago

Collecting some related issues and PRs here for future navigators:

danswick commented 6 months ago

There are some outstanding follow-up tasks here, but this ticket is otherwise complete.