TFC-ListProject / build-the-list

build the most exhaustive list of campaigns down to the district level
0 stars 1 forks source link

sanity check for DML #34

Open dbolson opened 7 years ago

dbolson commented 7 years ago

Trying to get a sanity check on this. I want to make sure all the won columns in district_election_results are set properly. For example,

SELECT * FROM district_election_results WHERE district_id = 1885 AND election_id = 131;

returns

 candidate_id | district_id | election_id | votes |       percentage        | withdrew | won |          created_at
--------------+-------------+-------------+-------+-------------------------+----------+-----+-------------------------------
         1579 |        1885 |         131 | 16300 | 45.98025387870239774300 | f        | ∅   | 2017-06-03 17:28:54.617013-07
         1591 |        1885 |         131 | 19150 | 54.01974612129760225700 | f        | ∅   | 2017-06-03 17:28:54.617013-07

These results accurately show that Tim Kaine won over George Allen in 2012, but the won column is not set.

The following query seems to give me all the results with the highest vote totals, some of which have won set and some that don't. I plan to update the ones not set to be set to true. Does this query look reasonable? Also, I don't care about presidential elections since they don't win districts.

SELECT der.*
  FROM (
    SELECT
      candidate_id,
      district_id,
      election_id,
      won,
      votes,
      MAX(votes) over (partition by district_id) AS max_votes
    FROM district_election_results
  ) der
JOIN elections e ON e.id = der.election_id
JOIN election_types et ON et.id = e.election_type_id
WHERE
  votes = max_votes
  AND et.name != 'us president'
;
mdeland commented 7 years ago

I think that was a US senate election right? so whether or not they won that state district is similar to whether or not a president did.

It might make sense instead to look at a query like

SELECT der.*
  FROM (
    SELECT
      candidate_id,
      district_id,
      election_id,
      won,
      votes,
      MAX(votes) over (partition by district_id) AS max_votes
    FROM district_election_results
  ) der
JOIN elections e ON e.id = der.election_id
JOIN election_types et ON et.id = e.election_type_id
WHERE
  votes = max_votes
  AND et.name like 'state%house'
tarheel commented 7 years ago

Hey, so this was a deliberate design decision that Matt and I briefly discussed on Slack the other night: I'm only setting the won column in district_election_results when the district_type matches the election_type, since (as you mentioned above) in other cases it would be ambiguous whether "won" referred to who got more votes in that particular district or who won the overall election. Here's the logic: https://github.com/TFC-ListProject/build-the-list/blob/master/resources/election_data_post_process.sql#L28

If you think we should still set won in these cases, I'm open to it; I just wasn't sure which logic to use or how we would use it.

dbolson commented 7 years ago

We were hoping to show a district's "winner" when the user hovers over a district after entering a state, year, and election type. I guess if a user wanted to see the presidential "winner" it would mean per district.

Although if we're saying we're only predicting certain types of elections, we would only show winners of those types.

@davitykale What were you thinking for showing national elections?