datamade / bga-payroll

💰 How much do your public officials make?
4 stars 4 forks source link

Add materialized view with total_pay #508

Closed hancush closed 3 years ago

hancush commented 3 years ago

Description

In #437, we discussed indexing the calculation of total pay, as well as storing the top 5 salaries for each employer in a materialized view.

Postgres would not used the calculated index, and the top 5 salaries materialized view did not work for departments.

Instead, I blended the approaches, creating a materialized view with total pay for all employer salaries, and adding an index on total pay for quick sorting.

The resulting query plan for pulling the top salaries in Cook County looks like this, i.e., pretty fast:

bga_payroll_relaunch=# explain analyze select * from payroll_employer_highest_salaries where employer_id = 50321 or employer_parent_id = 50321 order by total_pay desc limit 5;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..7.83 rows=5 width=23) (actual time=0.053..0.088 rows=5 loops=1)
   ->  Index Scan Backward using total_pay on payroll_employer_highest_salaries  (cost=0.43..64183.99 rows=43379 width=23) (actual time=0.051..0.080 rows=5 loops=1)
         Filter: ((employer_id = 50321) OR (employer_parent_id = 50321))
         Rows Removed by Filter: 61
 Planning time: 0.104 ms
 Execution time: 0.112 ms
(6 rows)

I've revised the top salaries query, as well as each percentile query, to use this materialized view rather than calculating total pay on the fly. This seems to have improve query time on the staging site!