MEDSL / 2022-elections-official

Official returns for the 2022 Midterm Elections
15 stars 3 forks source link

some districts contain results for multiple districts #18

Closed NickCrews closed 7 months ago

NickCrews commented 7 months ago

Look at all the unique districts that are present in a single precinct. Let's only look at STATE HOUSE offices because we can be sure that every precinct should only be voting for a single house district. Maybe this is also a problem for other offices, but that's out of scope for now.

(I can provide more complete/different code snippets if you want)

t22.filter(
    _.office == "STATE HOUSE",
).group_by(
    "state_po",
    _.county_name.fillna("NULL").name("county"),
    "precinct",
    "office",
).agg(
    districts=_.district.collect().unique(),
).filter(
    _.districts.length() == 2,
)
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ state_po ┃ county    ┃ precinct                 ┃ office      ┃ districts      ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string   │ string    │ string                   │ string      │ array<string>  │
├──────────┼───────────┼──────────────────────────┼─────────────┼────────────────┤
│ TX       │ HAYS      │ 2090450B_5513            │ STATE HOUSE │ ['073', '045'] │
│ AL       │ BLOUNT    │ PROVISIONAL              │ STATE HOUSE │ ['011', '034'] │
│ AL       │ ETOWAH    │ NE ETOWAH COMM_ CTR_     │ STATE HOUSE │ ['029', '028'] │
│ AL       │ CLARKE    │ COFFEVILLE HIGH          │ STATE HOUSE │ ['068', '065'] │
│ AL       │ LIMESTONE │ ARDMORE SR_ CTR_         │ STATE HOUSE │ ['005', '006'] │
│ AL       │ LIMESTONE │ ATHENS SR_ CITIZEN CTR_  │ STATE HOUSE │ ['002', '005'] │
│ AL       │ BALDWIN   │ NEW LIFE ASSEMBLY OF GOD │ STATE HOUSE │ ['096', '064'] │
│ AL       │ MADISON   │ W HUNTSVILLE CH CHRIST   │ STATE HOUSE │ ['053', '019'] │
│ AL       │ TALLADEGA │ LIMBAUGH COMM CTR        │ STATE HOUSE │ ['032', '033'] │
│ AL       │ TALLADEGA │ TALLADEGA CENTRAL HS GYM │ STATE HOUSE │ ['033', '032'] │
│ …        │ …         │ …                        │ …           │ …              │
└──────────┴───────────┴──────────────────────────┴─────────────┴────────────────┘

Some precincts are PROVISIONAL, and so probably are expected ot have multiple districts. But others look like regular precincts. If we zoom into one precinct, note that district 32 and 33 are present.

t22.filter(
    _.state_po == "AL",
    _.county_name == "TALLADEGA",
    _.precinct == "LIMBAUGH COMM CTR",
    _.office == "STATE HOUSE",
)
┏━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ year  ┃ date       ┃ state_po ┃ office      ┃ district ┃ magnitude ┃ special ┃ stage  ┃ county_name ┃ precinct          ┃ writein ┃ candidate           ┃ party_detailed ┃ party_simplified ┃ mode   ┃ votes ┃
┡━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ date       │ string   │ string      │ string   │ int64     │ boolean │ string │ string      │ string            │ boolean │ string              │ string         │ string           │ string │ int64 │
├───────┼────────────┼──────────┼─────────────┼──────────┼───────────┼─────────┼────────┼─────────────┼───────────────────┼─────────┼─────────────────────┼────────────────┼──────────────────┼────────┼───────┤
│  2022 │ 2022-11-08 │ AL       │ STATE HOUSE │ 033      │         1 │ False   │ GEN    │ TALLADEGA   │ LIMBAUGH COMM CTR │ False   │ FRED CRUM SR        │ DEMOCRAT       │ DEMOCRAT         │ TOTAL  │   409 │
│  2022 │ 2022-11-08 │ AL       │ STATE HOUSE │ 033      │         1 │ False   │ GEN    │ TALLADEGA   │ LIMBAUGH COMM CTR │ False   │ UNDER VOTES         │                │                  │ TOTAL  │    11 │
│  2022 │ 2022-11-08 │ AL       │ STATE HOUSE │ 033      │         1 │ False   │ GEN    │ TALLADEGA   │ LIMBAUGH COMM CTR │ False   │ BEN ROBBINS         │ REPUBLICAN     │ REPUBLICAN       │ TOTAL  │  1064 │
│  2022 │ 2022-11-08 │ AL       │ STATE HOUSE │ 032      │         1 │ False   │ GEN    │ TALLADEGA   │ LIMBAUGH COMM CTR │ False   │ BARBARA BIGSBY BOYD │ DEMOCRAT       │ DEMOCRAT         │ TOTAL  │    72 │
│  2022 │ 2022-11-08 │ AL       │ STATE HOUSE │ 032      │         1 │ False   │ GEN    │ TALLADEGA   │ LIMBAUGH COMM CTR │ False   │ EVAN B JACKSON      │ REPUBLICAN     │ REPUBLICAN       │ TOTAL  │    21 │
│  2022 │ 2022-11-08 │ AL       │ STATE HOUSE │ 033      │         1 │ False   │ GEN    │ TALLADEGA   │ LIMBAUGH COMM CTR │ True    │ WRITE-IN            │ NULL           │ NULL             │ TOTAL  │     1 │
└───────┴────────────┴──────────┴─────────────┴──────────┴───────────┴─────────┴────────┴─────────────┴───────────────────┴─────────┴─────────────────────┴────────────────┴──────────────────┴────────┴───────┘

Is this actually a problem, or is this expected?

sbaltzmit commented 7 months ago

Those precincts appear to be actual polling places, and in the raw data, one state house race will commonly have votes from multiple polling places (including the overlap one you've identified). So in this case I believe the data are teaching us that, in Talladega county Alabama, a polling place will often draw from a region that spans multiple state house districts. It isn't a cleaning problem because it's in the raw data, and I see no reason to think it's a problem with the raw data.