Open NickCrews opened 2 years ago
Looking at one of the filings, 1386952, in your table, I don't see anything wrong with the data. Filing 1386952 has not been amended. Most_recent_file_number
, file_number
, previous_file_number
and amendment_chain
will all contain the same report ID (file_number
), 1386952.
most_recent_file_number | file_number | previous_file_number | amendment_chain | is_amended | most_recent | amendment_version | amendment_indicator | committee_id | form_type | report_year | report_type |
---|---|---|---|---|---|---|---|---|---|---|---|
1386952 | 1386952 | 1386952 | {1386952} | f | t | 0 | N | C00737213 | F3 | 2020 | 12P |
Looking at a report that has been amended might help, too. Note, for this example, I changed your query parameters to include all versions of filings by removing the "Current Version" filter. The example below shows a report that was amended twice.
most_recent_file_number | file_number | previous_file_number | amendment_chain | is_amended | most_recent | amendment_version | amendment_indicator | committee_id | form_type | report_year | report_type |
---|---|---|---|---|---|---|---|---|---|---|---|
1151343 | 1118027 | 1118027 | {1118027} | t | f | 0 | N | C00554709 | F3 | 2016 | 12G |
1151343 | 1131084 | 1118027 | {1118027,1131084} | t | f | 1 | A | C00554709 | F3 | 2016 | 12G |
1151343 | 1151343 | 1131084 | {1118027,1131084,1151343} | f | t | 2 | A | C00554709 | F3 | 2016 | 12G |
Most_recent_file_number is the same for all three entries because only one filing is the most recent.
New (original report) (amendment_version = 0): most_recent_file_number = file_number of amendment 2, file_number, previous_file_number and amendment_chain all contain the same report ID (file_number). The is_amended flag is true, and the most_recent flag is false.
Amendment 1 (amendment_version = 1): most_recent_file_number = file_number of amendment 2, file_number = file_number of this report (amendment 1), previous_file_number and amendment_chain = file_number of the original report. The is_amended flag is true, and the most_recent flag is false.
Amendment 2 (amendment_version = 1): most_recent_file_number and file_number = file_number of this report (amendment 2), previous_file_number = file_number of amendment 1, amendment_chain is an array containing the file_numbers of the original, amendment 1 and amendment 2. The is_amended flag is false, and the most_recent flag is true.
Hi! I'm a data engineer with a possible bug. First though, hats off to you all, the whole FEC ecosystem has been surprisingly easy to interact with! So thanks for your work :)
I've been looking at the raw filings summaries that I downloaded from https://www.fec.gov/data/filings/?data_type=processed&min_receipt_date=01%2F01%2F2010&most_recent=true&form_type=F3&form_type=F3P. Hopefully that link should be reproducible, at least for this purpose.
If I do
Then I see the following table. Note how in the
most_recent_file_number
column, there are a few filings that appear twice. I wouldn't expect this to happen, I would expect there would be only one. Second, if you look at theis_amended
column, I would expect that to always beFalse
, since these are the most current versions of the reports. But, one isTrue
. Third, if you look atamendment_chain
vsprevious_file_number
, they don't always agree.Am I misunderstanding the schema/meanings of this table? Or is this a data integrity problem? If it's a problem, I thought I'd point it out in case you wanted to add some QA to catch these sorts of problems.