Closed bill10 closed 7 years ago
Precinct id is unique to a county. So for a given election we need to combine county and precinct fields to get a unique ID.
I believe that county+precinct is unique across time, but I don't know for certain.
While county+precinct I expect to be unique across time I do believe that not all county+precinct_ids will appear for every election. Some potential reasons that a county+precinct_id may not appear for a particular election:
It should be. But it is not because the precinct column is messy with provisional etc. in it.
The SQL below will show some examples.
select contest_name, county, precinct, election_date, candidate, count(precinct) as cprecinct
from contest_precinct
group by contest_name, county, precinct, election_date, candidate
order by cprecinct desc
limit 1000;
Can we simply remove from the database these "pseudo-precincts"?
After looking more into this issue, I somehow begin to think that having, e.g., ABSENTEE, as a precinct makes sense in the precinct result table because ABSENTEE votes are not precinct-specific (right?). All we know if the total number of ABSENTEE votes; we don't know in specific how many ABSENTEE votes come from which precinct. So it is not clear what number to fill in the ABSENTEE column for each precinct.
In its current form, when we aggregate precinct results into county-level results, the number of ABSENTEE votes is correctly counted.
Does this make sense? If so, there seems to be nothing we can improve, but to note ourselves and future users that ABSENTEE should not be confused as a precinct in any usage. I will make that clear in the schema readme.
And we need to make sure there is no double counting in the precinct results. This could done easily by checking if the numbers in the ABSENTEE, etc. column are 0 for real precincts.
BTW, is any student looking for work and willing to help me clean the data?
Since this issue is closely related to #4 , according to the law of parsimony, I am going to close this one and please comment in #4 .
This issue was moved to NCVotes/results-ingestor#3
If yes, use it as keys for precincts. If not, need to combine it with county and/or time.