Closed jwchumley closed 6 years ago
@LindsayYoung and I noticed some performance issues in pg_stat_activity
with raw/efiling endpoints.
Lindsay is pulling the AWS logs so we can analyze which queries are taking the longest. A couple things to consider are: increasing the API/CMS timeout above 1 minute (this has some significant potential drawbacks. We always want CMS timeouts to be shorter than API timouts), partitioning the views, archiving filings over a certain age (the data is available in the processed endpoints), adding/modifying indexes (which may slow inserts), and using PGBadger and explain/analyze in Postgres to identify other opportunities for optimization. We should also follow up on the last round of analysis on slow queries. We are having a lag of more than 15 minutes
We got another feedback message about this. It's going to be a problem if we don't improve performance pretty soon. @PaulClark2 Agree? The raw data won't even display.
After analyzing found out that primary cause of the slowness and time outs due to regular views being used instead of materialized views or tables. I have tested two possible scenarios below:
Use real_efile.sa7 table instead of real_efile_sa7 view. We have this vies because at the beginning all the data tables were in public schema and we can drop this view after program change. After that created indexes on committee_id, and state. Performance some what improved and no time outs. But still performance is not an exceptable level.
Replace efiling_amendment_chain_vw with a materialized view. This improves performance a lot. I think this is the way to go, unless @vrajmohan or @ccostino comes up with a better solution.
Drawback of the #2 is that we need to refresh this materialized view daily or hourly. This is real efile (raw) data. If @jwchumley or @PaulClark2 can answer how current the real efile (raw) data has to be then we can create a redis/celery-beat schedule to refresh this materialized view.
Thanks Rohan,
If the refresh is needed in order to make the filings appear than hourly is not often enough. Expectation on efiling availability is real time. How long does refresh take? Is it something that could happen every five minutes?
Jeff Chumley
Electronic Filing, Web and Disclosure Branch Manager
Federal Election Commission
From: Rohan Jay notifications@github.com Sent: Sunday, April 15, 2018 5:17 PM To: fecgov/openFEC Cc: Jeff Chumley; Mention Subject: Re: [fecgov/openFEC] Analyze slow raw/efiling queries (#2916)
After analyzing found out that primary cause of the slowness and time outs due to regular views being used instead of materialized views or tables. I have tested two possible scenarios below:
Use real_efile.sa7 table instead of real_efile_sa7 view. We have this vies because at the beginning all the data tables were in public schema and we can drop this view after program change. After that created indexes on committee_id, and state. Performance some what improved and no time outs. But still performance is not an exceptable level.
Replace efiling_amendment_chain_vw with a materialized view. This improves performance a lot. I think this is the way to go, unless @vrajmohanhttps://github.com/vrajmohan or @ccostinohttps://github.com/ccostino comes up with a better solution.
Drawback of the #2https://github.com/fecgov/openFEC/issues/2 is that we need to refresh this materialized view daily or hourly. This is real efile (raw) data. If @jwchumleyhttps://github.com/jwchumley or @PaulClark2https://github.com/PaulClark2 can answer how current the real efile (raw) data has to be then we can create a redis/celery-beat schedule to refresh this materialized view.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/fecgov/openFEC/issues/2916#issuecomment-381438761, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMCA7x9bKXKI5nh50oeeHPKgAYPn3EJrks5to7kHgaJpZM4SCx5J.
Here is the query in question: SELECT real_efile_sa7.line_num AS real_efile_sa7_line_num, real_efile_sa7.tran_id AS real_efile_sa7_tran_id, real_efile_sa7.imageno AS real_efile_sa7_imageno, real_efile_sa7.entity AS real_efile_sa7_entity, real_efile_sa7.amend AS real_efile_sa7_amend, real_efile_sa7.br_tran_id AS real_efile_sa7_br_tran_id, real_efile_sa7.br_sname AS real_efile_sa7_br_sname, real_efile_sa7.create_dt AS real_efile_sa7_create_dt, real_efile_sa7.repid AS real_efile_sa7_repid, real_efile_sa7.rel_lineno AS real_efile_sa7_rel_lineno, real_efile_sa7.comid AS real_efile_sa7_comid, real_efile_sa7.prefix AS real_efile_sa7_prefix, real_efile_sa7.fname AS real_efile_sa7_fname, real_efile_sa7.mname AS real_efile_sa7_mname, real_efile_sa7.name AS real_efile_sa7_name, real_efile_sa7.suffix AS real_efile_sa7_suffix, real_efile_sa7.city AS real_efile_sa7_city, real_efile_sa7.state AS real_efile_sa7_state, real_efile_sa7.zip AS real_efile_sa7_zip, real_efile_sa7.indemp AS real_efile_sa7_indemp, real_efile_sa7.indocc AS real_efile_sa7_indocc, real_efile_sa7.ytd AS real_efile_sa7_ytd, real_efile_sa7.amount AS real_efile_sa7_amount, real_efile_sa7.date_con AS real_efile_sa7_date_con, real_efile_sa7.other_comid AS real_efile_sa7_other_comid, real_efile_sa7.donor_comname AS real_efile_sa7_donor_comname, real_efile_sa7.other_str1 AS real_efile_sa7_other_str1, real_efile_sa7.other_str2 AS real_efile_sa7_other_str2, real_efile_sa7.other_city AS real_efile_sa7_other_city, real_efile_sa7.other_state AS real_efile_sa7_other_state, real_efile_sa7.other_zip AS real_efile_sa7_other_zip, real_efile_sa7.memo_code AS real_efile_sa7_memo_code, real_efile_sa7.memo_text AS real_efile_sa7_memo_text, real_efile_sa7.contributor_name_text AS real_efile_sa7_contributor_name_text, real_efile_sa7.contributor_employer_text AS real_efile_sa7_contributor_employer_text, real_efile_sa7.contributor_occupation_text AS real_efile_sa7_contributor_occupation_text, real_efile_sa7.pgo AS real_efile_sa7_pgo, ofec_committee_history_mv_1.idx AS ofec_committee_history_mv_1_idx, ofec_committee_history_mv_1.name AS ofec_committee_history_mv_1_name, ofec_committee_history_mv_1.committee_id AS ofec_committee_history_mv_1_committee_id, ofec_committee_history_mv_1.cycles AS ofec_committee_history_mv_1_cycles, ofec_committee_history_mv_1.treasurer_name AS ofec_committee_history_mv_1_treasurer_name, ofec_committee_history_mv_1.treasurer_text AS ofec_committee_history_mv_1_treasurer_text, ofec_committee_history_mv_1.committee_type AS ofec_committee_history_mv_1_committee_type, ofec_committee_history_mv_1.committee_type_full AS ofec_committee_history_mv_1_committee_type_full, ofec_committee_history_mv_1.designation AS ofec_committee_history_mv_1_designation, ofec_committee_history_mv_1.designation_full AS ofec_committee_history_mv_1_designation_full, ofec_committee_history_mv_1.organization_type AS ofec_committee_history_mv_1_organization_type, ofec_committee_history_mv_1.organization_type_full AS ofec_committee_history_mv_1_organization_type_full, ofec_committee_history_mv_1.party AS ofec_committee_history_mv_1_party, ofec_committee_history_mv_1.party_full AS ofec_committee_history_mv_1_party_full, ofec_committee_history_mv_1.state AS ofec_committee_history_mv_1_state, ofec_committee_history_mv_1.street_1 AS ofec_committee_history_mv_1_street_1, ofec_committee_history_mv_1.street_2 AS ofec_committee_history_mv_1_street_2, ofec_committee_history_mv_1.city AS ofec_committee_history_mv_1_city, ofec_committee_history_mv_1.state_full AS ofec_committee_history_mv_1_state_full, ofec_committee_history_mv_1.zip AS ofec_committee_history_mv_1_zip, ofec_committee_history_mv_1.candidate_ids AS ofec_committee_history_mv_1_candidate_ids, ofec_committee_history_mv_1.cycle AS ofec_committee_history_mv_1_cycle, reps_1.repid AS reps_1_repid, reps_1.form AS reps_1_form, reps_1.comid AS reps_1_comid, reps_1.com_name AS reps_1_com_name, reps_1.timestamp AS reps_1_timestamp, reps_1.create_dt AS reps_1_create_dt, reps_1.from_date AS reps_1_from_date, reps_1.through_date AS reps_1_through_date, reps_1.starting AS reps_1_starting, reps_1.ending AS reps_1_ending, reps_1.rptcode AS reps_1_rptcode, reps_1.previd AS reps_1_previd, reps_1.rptnum AS reps_1_rptnum, efiling_amendment_chain_vw_1.repid AS efiling_amendment_chain_vw_1_repid, efiling_amendment_chain_vw_1.previd AS efiling_amendment_chain_vw_1_previd, efiling_amendment_chain_vw_1.amendment_chain AS efiling_amendment_chain_vw_1_amendment_chain, efiling_amendment_chain_vw_1.longest_chain AS efiling_amendment_chain_vw_1_longest_chain, efiling_amendment_chain_vw_1.most_recent_filing AS efiling_amendment_chain_vw_1_most_recent_filing, efiling_amendment_chain_vw_1.depth AS efiling_amendment_chain_vw_1_depth, efiling_amendment_chain_vw_1.last AS efiling_amendment_chain_vw_1_last, reps_1.superceded AS reps_1_superceded FROM real_efile_sa7 LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_1 ON real_efile_sa7.comid = ofec_committee_history_mv_1.committee_id AND EXTRACT(year FROM real_efile_sa7.create_dt) + CAST(EXTRACT(year FROM real_efile_sa7.create_dt) AS INTEGER) % :param_1 = ofec_committee_history_mv_1.cycle LEFT OUTER JOIN real_efile.reps AS reps_1 ON real_efile_sa7.repid = reps_1.repid LEFT OUTER JOIN efiling_amendment_chain_vw AS efiling_amendment_chain_vw_1 ON efiling_amendment_chain_vw_1.repid = reps_1.repid WHERE real_efile_sa7.comid IN (:comid_1) AND real_efile_sa7.state IN (:state_1)
This from the feedback repo:
Search for expenses and receipts for Laura for Kansas C00631002 Returns all raw data, not limited to campaign, doesnt do this when searching opposition candidate
It is timing out and erroring on contirbutor_state=KS filter.