fecgov / FEC

A general discussion forum for FEC.gov. This is the best place to submit general feedback.
https://www.fec.gov
Other
228 stars 55 forks source link

Differences between financial summary on web interface and aggregates of bulk data #9916

Open helgridly opened 4 years ago

helgridly commented 4 years ago

Hi folks,

I've loaded the bulk data for the 2015-2016 period into a local database using the schemas provided and am trying to align the numbers in the bulk data with the corresponding aggregate numbers in the financial summary on the web interface. I'm finding some minor discrepancies and would like to know where I'm going wrong.

Please feel free to just link me to the code for these aggregations, if it's available -- that'd probably help answer other questions I might have in the future too :)

As an example, I'm using committee C00547349 and the 2016 cycle, available on the web interface here.

1. Operating Expenditures

Using the bulk data labelled "operating expenditures" (oppexp16.zip):

select sum(transaction_amt)
from operating_expends
where cmte_id = 'C00547349'
and memo_cd IS NULL

gives me 39,155,429.75, whereas the web interface says total operating expenditures lists 39,178,395.10.

2. Independent Expenditures

Using the bulk data labelled "Contributions from committees to candidates & independent expenditures" (pas216.zip):

select distinct rpt_tp
from com_to_cand_ind_expend
where cmte_id = 'C00547349'
and memo_cd IS NULL

gives me 9,784,721.00 vs the web interface's 9,784,794.95 (so close!).

3. Other Disbursements

The web interface lists "Other disbursements". How do I find these in the bulk data?

4. Committees transferring funds to themselves

In the "Any transaction from one committee to another" data (oth16.zip), there are plenty of records where cmte_id and other_id are identical. Can you give me a sense of what these transactions might represent?

Thanks so much! I really appreciate your work!

PaulClark2 commented 4 years ago

@helgridly

  1. Operating Expenditures The amount reported on the Detailed Summary Page can be larger than the sum of the itemized operating expenditures disclosed on Schedule B. Committees are not required to itemize payments to a person/vendor that aggregate to $200 or less in a calendar year, but they are required to include those expenditures in the amount disclosed on the Detailed Summary Page.

  2. Independent Expenditures The amount reported on the Detailed Summary Page can be larger than the sum of the itemized independent expenditures disclosed on Schedule E. Committees are not required to itemize payments to a person/vendor that aggregate to $200 or less in a calendar year, but they are required to include those expenditures in the amount disclosed on the Detailed Summary Page.

  3. Other Disbursements Currently we don't have an "other disbursements" bulk data file.

  4. Committees transferring funds to themselves This is most likely filer error where the filing committee has disclosed their own name in the payee section instead of the receiving committee's name.

helgridly commented 4 years ago

Thanks -- this is very helpful. I can slurp the other disbursements from the API instead.

One followup question: is there a way to calculate the aggregate "other disbursements" amount for some committee from other bulk data? Maybe TTL_DISB on the PAC and party summary file minus some other values or aggregates? If not I can get it from the API too, just wanted to check.

PaulClark2 commented 4 years ago

You're welcome. Is this what you are looking for?

https://www.fec.gov/files/bulk-downloads/2016/committee_summary_2016.csv -See Column AB (OTHER_DISB) https://www.fec.gov/campaign-finance-data/committee-summary-file-description/

Find this file on this page: https://www.fec.gov/data/browse-data/?tab=committees

helgridly commented 4 years ago

Oh, interesting - I didn't know those were there too. That looks very similar to the PAC and party summary file bulk data, with a few minor differences in columns. Are those CSVs a better source of truth than the bulk data?

PaulClark2 commented 4 years ago

The committee summary file includes candidate committees (committee designation "P" and "A") while the PAC and party summary file (webk) does not. WEBK does include candidate joint fundraising committees (committee designation "J"). The committee summary file includes more data fields. We began generating the committee summary file during the 2007-08 cycle. Both files are generated nightly.

At the time we decided to keep generating WEBK for then current users of that file. It might be time to consider retiring WEBK.

Here's a field comparison of the two files: committee summary PAC and committee summary (webk)
Link_Image
CMTE_ID CMTE_ID
CMTE_NM CMTE_NM
CMTE_TP CMTE_TP
CMTE_DSGN CMTE_DSGN
CMTE_FILING_FREQ CMTE_FILING_FREQ
CMTE_ST1
CMTE_ST2
CMTE_CITY
CMTE_ST
CMTE_ZIP
TRES_NM
CAND_ID
FEC_ELECTION_YR
INDV_CONTB INDV_CONTRIB
PTY_CMTE_CONTB
OTH_CMTE_CONTB OTHER_POL_CMTE_CONTRIB
TTL_CONTB
TRANF_FROM_OTHER_AUTH_CMTE TRANS_FROM_AFF
OFFSETS_TO_OP_EXP
OTHER_RECEIPTS
TTL_RECEIPTS TTL_RECEIPTS
TRANF_TO_OTHER_AUTH_CMTE TRANF_TO_AFF
OTH_LOAN_REPYMTS
INDV_REF INDV_REFUNDS
POL_PTY_CMTE_REF OTHER_POL_CMTE_REFUNDS
TTL_CONTB_REF
OTHER_DISB
TTL_DISB TTL_DISB
NET_CONTB
NET_OP_EXP
COH_BOP COH_BOP
CVG_START_DT
COH_COP COH_COP
CVG_END_DT CVG_END_DT
DEBTS_OWED_BY_CMTE DEBTS_OWED_BY
DEBTS_OWED_TO_CMTE
INDV_ITEM_CONTB
INDV_UNITEM_CONTB
OTH_LOANS
TRANF_FROM_NONFED_ACCT
TRANF_FROM_NONFED_LEVIN
TTL_NONFED_TRANF NONFED_TRANS_RECEIVED
LOAN_REPYMTS_RECEIVED
OFFSETS_TO_FNDRSG
OFFSETS_TO_LEGAL_ACCTG
FED_CAND_CONTB_REF
TTL_FED_RECEIPTS
SHARED_FED_OP_EXP
SHARED_NONFED_OP_EXP NONFED_SHARE_EXP
OTHER_FED_OP_EXP
TTL_OP_EXP
FED_CAND_CMTE_CONTB CONTRIB_TO_OTHER_CMTE
INDT_EXP IND_EXP
COORD_EXP_BY_PTY_CMTE PTY_COORD_EXP
LOANS_MADE
SHARED_FED_ACTVY_FED_SHR
SHARED_FED_ACTVY_NONFED
NON_ALLOC_FED_ELECT_ACTVY
TTL_FED_ELECT_ACTVY
TTL_FED_DISB
CAND_CNTB CAND_CONTRIB
CAND_LOAN CAND_LOANS
TTL_LOANS TTL_LOANS_RECEIVED
OP_EXP
CAND_LOAN_REPYMNT CAND_LOAN_REPAY
TTL_LOAN_REPYMTS LOAN_REPAY
OTH_CMTE_REF
TTL_OFFSETS_TO_OP_EXP
EXEMPT_LEGAL_ACCTG_DISB
FNDRSG_DISB
ITEM_REF_REB_RET
SUBTTL_REF_REB_RET
UNITEM_REF_REB_RET
ITEM_OTHER_REF_REB_RET
UNITEM_OTHER_REF_REB_RET
SUBTTL_OTHER_REF_REB_RET
ITEM_OTHER_INCOME
UNITEM_OTHER_INCOME
EXP_PRIOR_YRS_SUBJECT_LIM
EXP_SUBJECT_LIMITS
FED_FUNDS
ITEM_CONVN_EXP_DISB
ITEM_OTHER_DISB
SUBTTL_CONVN_EXP_DISB
TTL_EXP_SUBJECT_LIMITS
UNITEM_CONVN_EXP_DISB
UNITEM_OTHER_DISB
TTL_COMMUNICATION_COST
COH_BOY
COH_COY
ORG_TP
helgridly commented 4 years ago

Paul - this is super helpful. I really appreciate the deep dive here. It sounds like downloading the CSV files (where available) will provide me with more complete data than the bulk data files. Would you say that's generally correct?

Ultimately all I'm trying to do here is populate a relational database so I can join across receipts, committees, and expenditures. Is there any chance of getting a snapshot of your database?

PaulClark2 commented 4 years ago

Hussein - I haven't forgotten about your request. I'm not sure if it's possible. I'll talk with our DBA team about it tomorrow.

helgridly commented 4 years ago

Thanks. I know it's a long shot, but I figured I'd ask :)

helgridly commented 4 years ago

Hi Paul - I'm about to start writing scripts to slurp all of schedule_a and schedule_b from the API (for the 2016 cycle). Let me know if a database snapshot is possible (or read-only credentials, or something) if you want to save me time and your server some effort :)

PaulClark2 commented 4 years ago

Hi, Hussein - We should be able to provide you with 2016 SA and SB data. I'm CCing someone (@rjayasekera ) from our database team. He might have some questions.

PaulClark2 commented 4 years ago

Hi again, Hussein. We can get the files to you by Tuesday (7/7) afternoon. Much of the team is currently involved in PI planning and this is a short week because of the holiday.

helgridly commented 4 years ago

This is amazing! Thank you!

dzhang-fec commented 4 years ago

@josephkuan @rohan @paul good morning. could you please help to review the followings: @josephkuan please copy to the FTP sites from 029's /u07/dzhang/. It has join "ofec_committee_history_mv to get 61 additional columns' API query data which is same with our sqls returns from API sql tracking. And it has the full data of fec_fitem_sched_a/b_2015_2016 as usual. Same rows account but with more data of API return. There is a README.txt on how to use it. -rw-r--r-- 1 oracle oracle 4477615794 Jul 2 09:06 sa_2016.csv.gz (4Gb) -rw-r--r-- 1 oracle oracle 1109410451 Jul 2 09:08 sb_2016.csv.gz (1Gb) -rw-r--r-- 1 oracle oracle 555 Jul 2 09:25 s_2016README.txt

PaulClark2 commented 4 years ago

Hi, @helgridly . The DBA was able to get the data files together this morning. There are three files. Their names and location are below. Please let us know if you have any questions.

https://www.fec.gov/files/bulk-downloads/index.html?prefix=bulk-downloads/sch-a-dump/

2020-07-02T13:55:17.000Z 584 Bytes s_2016README.txt 2020-07-02T13:45:26.000Z 4.48 GB sa_2016.csv.gz 2020-07-02T13:53:07.000Z 1.11 GB sb_2016.csv.gz

helgridly commented 4 years ago

This is great - thank you so much. Do you have the datatypes for the columns? I can probably figure them out but if you have a CREATE TABLE statement handy that'd save me some time.

PaulClark2 commented 4 years ago

Hussein, the read me file (s_2016README.txt) now includes the create table scripts.

helgridly commented 4 years ago

This is awesome -- thank you. I managed to get it loaded in and I'm happily digging away now :)

I'm aware I'm asking a ton of questions. I am trying to learn via the guides to spare you as many as I can. If it would be better for me to go ask them at the Google Group I'd be happy to do so.

That said -- I am digging through the Schedule A data and seeing some unexpected transaction types in the receipt_tp column. I know the transaction types are documented here, but can you tell me more about how a transaction gets its type assigned? It doesn't look like they're provided by the filter, so I assume it's done by the FEC on receipt; is there documentation on that process? (I'd also be happy looking at code, if that's easier.) I'm having trouble understanding some of the assignments I'm seeing and would like to better understand whether they're misclassifications or me missing some nuance.

PaulClark2 commented 4 years ago

Hussein, the attached pfd describes how we make categorization decisions for transactions disclosed on schedules A and B of Forms 3, 3P and 3X.

FEC transaction categorization.pdf

helgridly commented 4 years ago

Hi, I'm back. Sorry :)

So I've been poking through the 2016 dump you gave me and generating a list of questions. One huge thing just jumped out at me. The vast majority of records for Schedule A and Schedule B have their transaction types set to NULL (I'm checking the columns receipt_tp and disb_tp respectively). Did they just never get coded? Should I attempt to code them myself based on the line number, per the transaction categorization pdf you provided, or will that just lead me to madness? Or should I just use the line number and skip the transaction type entirely?

edit:\ I got an answer via the mailing list - transaction codes have been deprecated for some time now. I should be using line numbers and disbursement category codes instead. Feel free to ignore this question :)

PaulClark2 commented 4 years ago

Hi, Hussein.

Many of the transactions are not coded based on our coding procedures. For example, transactions reported on Line 11c (contributions from other committees/PACs) of Form 3 are generally not coded. These transactions are instead coded on the PAC reports, Line 23 (Contributions to Federal Candidates/Committees and Other Political Committees) of Form 3X. Another example, operating expenditures are not coded. Our coding scheme is more focused on receipts because most of our registrations focus on receipts.

I’ll go though this more systematically and post here. This will take me a few days.