fivethirtyeight / data

Data and code behind the articles and graphics at FiveThirtyEight
https://data.fivethirtyeight.com/
Creative Commons Attribution 4.0 International
16.7k stars 10.96k forks source link

Minor data cleaning on pollster-ratings data #252

Closed asyncify closed 1 year ago

asyncify commented 4 years ago

This was in the raw-polls.csv files, e.g.: https://github.com/fivethirtyeight/data/blob/master/pollster-ratings/2017/raw-polls.csv https://github.com/fivethirtyeight/data/blob/master/pollster-ratings/2018/raw-polls.csv https://github.com/fivethirtyeight/data/blob/master/pollster-ratings/2019/raw-polls.csv

Found 3 minor data flaws that I think affect derived result and rating for Trafalgar Group.

I concatenated into one file so I don't know the row numbers.

Poll 79816: 2017 GA-6 house, data error reversing result and 'rightcall' column cand1_pct=50.22 should be 48.59 cand2_pct=47.46 should be 50.46 And all derived fields (including rightcall) See: https://drive.google.com/file/d/0B4lhKxf9pMitSUE2X2ItLWhoYVU/view

Poll 91209: 2018 FL gov; Same issue cand1_pct=48.4 should be 46.6 cand2_pct=46.1 should be 50.0 See: https://drive.google.com/file/d/1ExmsCdRQYGvT7jRxcQx5NZlr7Gq40nWD/view

Poll 91210: 2018 FL senate; Same issue cand1_pct=48.4 should be 47.3 cand2_pct=46.1 should be 49.0 See: https://drive.google.com/file/d/1ExmsCdRQYGvT7jRxcQx5NZlr7Gq40nWD/view

radcliffem commented 1 year ago

Poll 91209 & 91210 are correct. They are from a survey conducted 10/29-30, where the cited google doc is from a survey conducted 11/4-5.

Poll 79816 also correct. It is from a survey dated 6/10-13, cited numbers are from a survey dated 6/17-18.

All 6 of these questions appear in the data for pollster ratings.