code4sac / sacramento-campaign-finance

Dataset and dashboard of money in local politics
https://sacramento-campaign-cash.netlify.app/
2 stars 5 forks source link

Double check the aggregated data against sources to make sure our math is correct #18

Closed jeremiak closed 1 year ago

jeremiak commented 1 year ago

Both Sacramento City and Sacramento County have portals for downloading campaign finance data per year that we download the data from for every year back to, and including, 2014.

We aggregate the contributions per contributor to each legislator (using all of the committees associated with each official) so that we can show the total amount received for the entire time period.

We've never double checked that our aggregated data is calculated correctly so we'd like to have somebody check our aggregated totals.

jeremiak commented 1 year ago

We don't need to download the data because we're pretty confident in all of the /data/schedule-*.json files so we can use those to double check our aggregation.

rileyschenck commented 1 year ago

Hate to be the bearer of bad news but unfortunately every single campaign total from the aggregate json object used for the website is off when compared with the totals of the combined json files. I added the Jupyter file to the main repository.

jeremiak commented 1 year ago

No worries, we don't "love" that kind of bad news @rileyschenck but better to catch it now than after we have to tell a reporter we got it wrong. I'll take a look at the Jupyter file tonight after work or over the weekend.

Thanks again!

jeremiak commented 1 year ago

Ok, I found out two things in debugging the data:

  1. The /src/lib/data.json file wasn't including schedule C contributions, which were being included in the notebook because the JSON files existed - now both schedule A and schedule C data is used.
  2. The filerName column was a little bit unreliable because of a bad aggregation where I wasn't accounting for legislators who had more than one committee.

I've addressed both of these in #23 and have gotten the notebook to match between the schedule-*.json files and data.json. So I think that means our aggregation math is correct.

rileyschenck commented 1 year ago

Almost correct! haha just ran the notebook with the new data.json and still seeing a few discrepancies. Also sorry I forgot that I had downloaded all those json files locally and changed the names/filepaths causing you that error

image

jeremiak commented 1 year ago

Ok, if those are the only committees/candidates that are different I think it's still accurate and just an artifact of some accounting for which name the committee uses.

Both Sue Frost and Rich Desmond have used a single committee and just changed the name each cycle, 1380596 and 1419486 respectively. That's why we only have one committee for each in config.js.

This is further confirmed by the fact that if you add up all the years for each candidate the totals match across the columns:

Rich - $505,207.56 Sue - $401,828.43

Thoughts @rileyschenck?

jeremiak commented 1 year ago

Closing, the data matches.