alex / nyt-2020-election-scraper

https://alex.github.io/nyt-2020-election-scraper/battleground-state-changes.html
MIT License
1.76k stars 289 forks source link

Error in hurdle calculate #364

Closed eebasso closed 3 years ago

eebasso commented 3 years ago

Line 333 in print-battleground-state-changes hurdle = (vote_diff + (votes_remaining (candidate1_votes + candidate2_votes)) / votes) / (2 votes_remaining) if votes_remaining > 0 else 0

This is the wrong formula for the two party hurdle because it takes into account the third party vote, which is irrelevant. This makes comparing the hurdle to the two party batch percentage an apples-to-oranges percentage. You should replace votes with the two party vote of candidate1_votes+candidate2_votes. This leads to a simplification of

hurdle = (vote_diff/votes_remaining + 1)/2 if votes_remaining > 0 else 0

For example, this is relevant for the current hurdle numbers for Trump in Arizona. With 110,925 votes remaining and a 20,102 BIden margin, Trump needs 59.1% of the remaining two party batch breakdown. Yet the current output says Trump's hurdle is just 58.3%. This is due to the error of including third party votes. Trump needs 58.3% of all the votes, but needs a higher amount of the two party vote.

Thank you so much for creating this.

fractionalhare commented 3 years ago

Hey there! The original hurdle calculation actually used the formula you described. You can see the pull request where we changed it to the current iteration in the files for PR #200. The related discussion took place in issue #194.

There isn't really a perfect way to tackle this when there are three candidates splitting the vote, only two of whom are really relevant. But the current scheme more accurately reflects what most people expect, according to the feedback we've gotten thus far.

eebasso commented 3 years ago

It's matter of taste then, but I think you should switch back to the original hurdle formula. As is, the hurdle is misleading because it makes Arizona look closer than it actually is. It lowers the hurdle for Trump and many people, including myself, compare that hurdle to the two party batch breakdown to see if Trump is above or below that threshold.

Perhaps you could have an extra column that gives the two-party % hurdle and keep the current column as is. I and many others would greatly appreciate that because it then becomes an apples-to-apples comparison with the two-party Batch Breakdown. It becomes easier to gauge the horse race.

The wording on the Hurdle is also confusing because it states "Note that third party candidates are not included in the batch breakdown. This is intentional." Don't you think this clarification should be on the Batch Breakdown instead? The Hurdle column clearly DOES include the third party votes.

eebasso commented 3 years ago

Thinking about this more, I can see that trying to compute the hurdle ratio is impossible with a third party involved. Therefore one needs an approximation. I haven't work out all the math yet, but my guess is that the current formula is a better approximation when the third party vote is small. Is that right?

eebasso commented 3 years ago

Aha! I found the correct formula. Both my old formula and the new formula were wrong. It should be as follows

hurdle = (vote_diff votes / ((candidate1_votes + candidate2_votes)votes_remaining) + 1 ) / 2 if votes_remaining > 0 else 0

titanous commented 3 years ago

Please take a look at #367, I think this is along similar lines to what you are suggesting.

eebasso commented 3 years ago

I wrote up a document that explains the differences between the two formulas. Correct Hurdle Formula.pdf

eebasso commented 3 years ago

I shouldn't say that the current formula is wrong. My apologies. It gives the percentage that the trailing candidate needs to receive out of the total vote remaining. However, my proposed formula gives the the hurdle percentage of the remaining two party vote, which seems much more relevant to compare to the Batch Trend column. The Batch Trend is the running average of the Batch Breakdown percentages, which are only between the two parties.

It's the difference between calculating DT / (DT + DB + DL) and DT / (DT + DB) where T and B are the two party candidates and L is the third party and DT, DB, DL are the gains each has.

titanous commented 3 years ago

@eebasso Can you take a look at #367?

fractionalhare commented 3 years ago

@eebasso This is a really great writeup (thanks for pulling out LaTeX for us)! We're asking you to look at #367 because we're fairly confident one of our devs landed on a very similar equation to what you've proposed here.

eebasso commented 3 years ago

Thank you fractionalhare. I commented on #367. My confusion arises from why the current formula was chosen over the one I proposed. I would think that DT/(DT+DB) would be a better ratio to compare to the Batch Trend percentage compared to DT/(DT+DB+DL). I think that the Batch Trend percentage is showing the latter and not the former.