Colorado Governor and US Senate district is `000`, not `STATEWIDE`

NickCrews commented 8 months ago

The rest of the states use the value STATEWIDE to encode a statewide election.

sbaltzmit commented 8 months ago

Thanks for the feedback! We haven't yet standardized the method for representing that an office is not district-based across states in the 2022 precinct data. Note that it should be standardized across offices/candidates/precincts within a state, but not yet across the states. In previous years we used the designator STATEWIDE for statewide offices, and null for non-statewide non-district-based offices, but I've come to believe that this convention is more confusing than helpful, so for the 2022 data my intention is to make it always null for all non-district-based offices in every state (and I agree that 0, which I think is just a quirk of how we cleaned Colorado in particular this year, and in fact I think is likely a data conversion error from inadvertendly coercing an empty string into an integer, is a somewhat odd way to designate the district not existing). Our cross-state standardization effort, which will culminate in migrating the full data to the Harvard Dataverse, is on my schedule for roughly April 1 through May 31, so by the end of May hopefully all non-district-based offices will have an empty district field. My personal read at the moment is that this issue in Colorado has very little risk of really screwing anything up because it is the same for every row within the office (right?), is in an office where the district field should just be ignored, and is not misleadingly the real value of an actual numbered district, so I am going to make a note to put it at the front of the queue when I start checking the standardization across states next month rather than bumping anything else down the priorities list to take on district field standardization right now. But if you disagree let me know and I'm happy to consider bumping it up in the order of priorities if there's a real risk it is messing up anyone's analysis. And if I don't follow up in this issue by the end of May please feel free to ping me.

NickCrews commented 8 months ago

First, thanks for all the work on this dataset. There is a huge need for this sort of thing.

In previous years we used the designator STATEWIDE for statewide offices, and null for non-statewide non-district-based offices, but I've come to believe that this convention is more confusing than helpful

I think this makes sense, and I think what I would expect. What was the confusing part? e.g. for GOVERNOR races people might expect NULL, but actually got STATEWIDE? I would like to interpret "district" as "the geography/constituency that this person represents", and I think that implementation is consistent with that definition?

my intention is to make it always null for all non-district-based offices in every state

How can we encode missing data? eg https://github.com/MEDSL/2022-elections-official/issues/7. If we can make it so there is NO missing data, then this sounds good to me. Don't know how possible this is.

it is the same for every row within the office (right?)

Yes, this is true. However, what I am trying to do is "for every state senate and state house race, compare that race to the top-of-ticket race". This requires me to find the "top-of-ticket" race for every precinct/district. So I need the statewide races to be encoded more consistently.

MEDSL / 2022-elections-official

Colorado Governor and US Senate district is `000`, not `STATEWIDE` #5