NCVotes / ncvoter

Moved from reesenewslab github. now just home for issues without a home. All the action's at https://github.com/NCVotes/ncvoter/issues
0 stars 0 forks source link

Is precinct id unique across counties across time #3

Closed bill10 closed 7 years ago

bill10 commented 7 years ago

If yes, use it as keys for precincts. If not, need to combine it with county and/or time.

rtburg commented 7 years ago

Precinct id is unique to a county. So for a given election we need to combine county and precinct fields to get a unique ID.

I believe that county+precinct is unique across time, but I don't know for certain.

While county+precinct I expect to be unique across time I do believe that not all county+precinct_ids will appear for every election. Some potential reasons that a county+precinct_id may not appear for a particular election:

bill10 commented 7 years ago

It should be. But it is not because the precinct column is messy with provisional etc. in it.

The SQL below will show some examples.

select contest_name, county, precinct, election_date, candidate, count(precinct) as cprecinct 
from contest_precinct 
group by contest_name, county, precinct, election_date, candidate
order by cprecinct desc
limit 1000;
rtburg commented 7 years ago

Can we simply remove from the database these "pseudo-precincts"?

bill10 commented 7 years ago

After looking more into this issue, I somehow begin to think that having, e.g., ABSENTEE, as a precinct makes sense in the precinct result table because ABSENTEE votes are not precinct-specific (right?). All we know if the total number of ABSENTEE votes; we don't know in specific how many ABSENTEE votes come from which precinct. So it is not clear what number to fill in the ABSENTEE column for each precinct.

In its current form, when we aggregate precinct results into county-level results, the number of ABSENTEE votes is correctly counted.

Does this make sense? If so, there seems to be nothing we can improve, but to note ourselves and future users that ABSENTEE should not be confused as a precinct in any usage. I will make that clear in the schema readme.

And we need to make sure there is no double counting in the precinct results. This could done easily by checking if the numbers in the ABSENTEE, etc. column are 0 for real precincts.

BTW, is any student looking for work and willing to help me clean the data?

bill10 commented 7 years ago

Since this issue is closely related to #4 , according to the law of parsimony, I am going to close this one and please comment in #4 .

rtburg commented 7 years ago

This issue was moved to NCVotes/results-ingestor#3