Closed dorothyyeager closed 5 years ago
Thanks for entering this issue, @dorothyyeager! After speaking with @PaulClark2, we need organization type and designation in addition to committee type to filter by SSF's and Leadership PACs. We'd need to add these to Schedule A in order to filter by them, which is a big lift.
For now, think we should remove those options from the receipts/individual contributions datatables, since they don't work.
I'll go ahead and copy this to the CMS to remove the filters as a short-term workaround.
This includes joint fundraising committees. For the statistical releases we exclude joint fundraising committees.
Separate Segregated Funds (SSFs):
Corporate: committee types N, Q or U, committee designations B, J or U, organization type C
Labor: committee types N, Q or U, committee designations B, J or U, organization type L
Trade: committee types N, Q or U, committee designations B, J or U, organization type T
Membership: committee types N, Q or U, committee designations B, J or U, organization type M, connected organization name NOT NULL and not equal to NONE
Cooperative: committee types N, Q or U, committee designations B, J or U, organization type V
Corporations without stock: committee types N, Q or U, committee designations B, J or U, organization type W
Nonconnected Committees:
Independent expenditure-only political committees (Super PACs): committee type O, committee designations B, J and U
Committees w with non-contribution accounts (Hybrid PACs): committee types V or W, committee designations B, J or U
Leadership PACs: committee types N or Q, committee designations D
Other nonconnected PACs:: committee types N, Q or U, committee designations B, J or U, organization type IS NULL OR (organization type M AND connected organization name IS NULL) OR (organization type M AND connected organization name NONE)
This sql is similar to what I use for the statistical releases. This sql includes joint fundraising committees.
SELECT
(CASE WHEN cm.org_tp = 'C' THEN '1-Corporate'
WHEN cm.org_tp = 'L' THEN '2-Labor'
WHEN cm.org_tp = 'T' THEN '3-Trade'
WHEN ((cm.org_tp IS NULL) OR (cm.org_tp = 'M' AND cm.connected_org_nm IS NULL) OR (cm.org_tp = 'M' AND cm.connected_org_nm = 'NONE')) THEN '7-NonConnected'
WHEN cm.org_tp = 'M' AND cm.connected_org_nm IS NOT NULL AND cm.connected_org_nm <> 'NONE' THEN '4-Membership'
WHEN cm.org_tp = 'V' THEN '5-Cooperative'
WHEN cm.org_tp = 'W' THEN '6-Corporation without stock' END) organization_type,
cm.cmte_id,
cm.cmte_nm,
cm.cmte_tp,
cm.cmte_dsgn,
cm.org_tp
FROM cmte_valid_fec_yr cm
WHERE cm.fec_election_yr = 2020
AND cm.cmte_tp in ('N', 'Q', 'U')
AND cm.cmte_dsgn not in 'D';
The CMS issue is already here: https://github.com/fecgov/fec-cms/issues/2913
We should consider asking the database team whether they'd rather do this at the same time as the DB updates for https://github.com/fecgov/openFEC/issues/3798
org_tp and cmte_dsgn are in cmte_valid_fec_yr
org_tp and cmte_dsgn columns had been added to fec_fitem_sched_a and fec_fitem_sched_b tables. It is better to keep this concise name of the columns
@fecjjeng sounds good! I think the WIP PR has those column names. https://github.com/fecgov/openFEC/pull/3837/files
The following is a summary of the steps need to be done for this ticket. A document with more detail named ADD_UPDATE_ORG_TP_COLUMNS.txt is in the shared drive.
Please note: This is a general list of work need to be done for add columns for the sched_a/sched_b tables. The actual code change will really very depending on the nature of the columns added.
--
-- Prepare migration script. The migration script needs to include
--
If the added column need to be in the filter, then add create indexes statement. If this is a B-tree indexes, then include the sorting field (disb_dt/sub_id and disb_amt/sub_id) to make compound indexes
Also update the following function as approprieate (which can be used to create indexes for the correspoinging fec_sched_a_xxxx_yyyy/fec_sched_b_xxxx_yyyy) in the future FUNCTION disclosure.finalize_itemized_schedule_a_tables FUNCTION disclosure.finalize_itemized_schedule_b_tables
NOTE: These columns will be added to all 3 cloud environment ahead of time at the same time since
The migration file will be submitted and merged AFTER the actual tasks (add column, backfill data, create indexes) had been done. Therefore it need to be written in a specific way so it will not error out IF the columns and indexes already exists.
-- -- Update Oracle packages, procedures, triggers, tables, materialized views, and views in intermediate database that calculate and control the Java Transfer program. -- -- -- Update Java programs and shell script -- -- -- Add columns to table, start daily transfer program, make sure all Oracle programs, Java programs, and shell scripts execute successfully -- -- -- Prepare scripts to backfilled the data. -- -- -- -- on Postgresql Database -- In this ticket, we also backfill the data for several tsvector columns that their content changed in issue #3798 (3 columns in sched_a and 2 columns in sched_b) -- A temporary update triggers are added to each sched_a and sched_b partitions tables. Therefore during the update activity of this ticket, the update trigger will fire and update the tsvector columns
-- -- ON AWS console -- temporarily upgrade the class of our postgresql database to more powerful machine -- NOTE: need to increase both master and replica. Otherwise a slow replica will not be able to catch up. This will slow down the whole process and the lag time will to too big to recover. -- Parameters WAL_MAX_SIZE, WAL_MIN_SIZE, WAL_KEEP_SEGMENTS will need to be increased as well -- Also need to anticipate increated storage size as well. Update operation in Postgresql will increase the size a lot. Also indexes will need space too. --
-- *** -- on Postgresql Database
-- get "BEFORE UPDATE" size select pg_size_pretty(pg_database_size('fec')); -- *** -- on Linux server -- Execute the shell script to do the update work.(may not be able to finish all job at one time, need to break into smaller tasks) -- NOTE: need to adjust the execution time to avoid nightly refresh time for all databases -- NOTE: need to adjust the execution time to avoid peak query time for PRD database
-- *** -- on Postgresql Database
-- *** -- on Postgresql Database
-- get "AFTER UPDATE" size select pg_size_pretty(pg_database_size('fec'));
-- **** -- validate the update
-- **** There are few rows with cmte_id that does not have corresponding data in cmte_valid_fec_yr. These are all in earlier cycles. Data after 2007_2008 are completed with cmte_dsgn/org_tp data
For those few rows, the tsvector columns are not updated. Use the following statement to update them.
-- **** -- Run create indexes script
-- **** -- check if all indexes are created select tablename, count(*) from pg_indexes where tablename like 'fec_fitem_sched_a%' and (indexname like 'idx_scheda%_org_tp%' or indexname like 'idx_scheda%_cmte_dsgn%') group by tablename order by tablename;
select tablename, indexname from pg_indexes where tablename like 'fec_fitem_sched_a%' and (indexname like 'idx_scheda%_org_tp%' or indexname like 'idx_scheda%_cmte_dsgn%') order by tablename, indexname -- **** -- clean up the temporary UPDATE TRIGGERS that were used to update the tsvector columns from another issue
--
-- ON AWS console
-- downgrade the class to be back to the regular class we use.
-- NOTE: need to increase both master and replica. Otherwise a slow replica will not be able to catch up. This will slow down the whole process and the lag time will to too big to recover.
-- Parameters WAL_MAX_SIZE, WAL_MIN_SIZE, WAL_KEEP_SEGMENTS will need to be back to original value
-- Review the storage size increase, make it official via Terraform.
--
DEV/STG/PRD databases had been backfilled, indexes created. Following is the summary of time consumed/space increased by this task. NOTE: The time indicated is the actual running time, with most powerful class of AWS database we can use, does NOT include lag time in between to avoid peak time and nightly refresh time. So it is a minimum. The local database is usually 2 to 3 times faster than the cloud databases in the regular classes.
Local backup database: Time: 57 hours
DEV: Time: 22 hours 48 minutes Space increased: 442 GB
STG: Time: 17 hours 26 minutes Space increased: 458 GB
PRD: Time: 19 hours 31 minutes Space increased: 315 GB
Unblock #3841 and moved it to ready column.
A CMS ticket https://github.com/fecgov/fec-cms/issues/3054 had been opened to add the filter back on.
User story I was trying to search all contributions and disbursements made to corporate SSFs in 2019-2020 and set filter accordingly. The results appeared to yield a list of contributions made to all types of committees. The first results in this screenshot, for example, show contributions to a candidate and to a labor organization SSF.
To do
org_tp
(Call itorg_type
) andcmte_dsgn
(call it committee_designation) columns to Schedule A and B tables.