caciviclab / disclosure-backend-static

Campaign finance data powering Open Disclosure California
https://caciviclab.org/odca-jekyll/
15 stars 13 forks source link

join the committees table on Ballot_Measure_Election to election_name #351

Open ChenglimEar opened 5 months ago

ChenglimEar commented 5 months ago

since committees change names to reflect what they are supporting or opposing for an election

This problem was found under the following circumstance:

There are some differences in the build output that are due to DISTINCT in SQL picking out different rows when there are multiple committees for the same Filer_ID. As an example, take a look at the difference for build/_data/candidates/oakland/2014-11-04/bryan-parker.json. The value for Filer_NamL is different and it comes from this query in the independent_candidate_expenditures view:

SELECT DISTINCT ON ("Filer_ID") "Filer_ID", "Filer_NamL" FROM committees WHERE "Filer_ID" = '983545'; The following query shows that there are two rows for the same Filer_ID and so the DISTINCT portion of the query picks a different one before and after the csvkit upgrade:

disclosure-backend=# SELECT "Filer_ID", "Filer_NamL" FROM committees WHERE "Filer_ID" = '983545'; Filer_ID | Filer_NamL
----------+--------------------------------------------------- 983545 | OAKPAC, Oakland Metropolitan Chamber of Commerce 983545 | OAK PAC, Oakland Metropolitan Chamber of Commerce (2 rows) If we force it to always pick the same one (like always pick the most recent one), it will help reduce the noise so that we can verify that there aren't any other differences that are more serious.

We want to try to do better than pick the most recent one this time around.