alarm-redist / fifty-states

Redistricting analysis for all 50 U.S. states
https://alarm-redist.github.io/fifty-states/
Other
9 stars 7 forks source link

2010 Arizona Congressional Districts #152

Closed mzhao80 closed 1 year ago

mzhao80 commented 1 year ago

Redistricting requirements

In Arizona, districts must:

  1. be contiguous
  2. have equal populations
  3. be geographically compact
  4. preserve county and municipality boundaries as much as possible
  5. favor competitive districts to the extent practicable

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%. We add a county/municipality constraint, as described below. We add a hinge Gibbs constraint targeting two majority-HVAP districts, one within Maricopa County and one outside of it (as exist in the enacted plan.) However, not all plans are guaranteed to have two majority-HVAP districts.

Data Sources

Data for Arizona comes from the ALARM Project's Redistricting Data Files.

Pre-processing Notes

No manual pre-processing decisions were necessary.

Simulation Notes

We sample 32,000 districting plans for Arizona across four independent runs of the SMC algorithm, and then thin the sample to down to 5,000 plans. To satisfy the Voting Rights Act constraint, we run the simulation in two steps.

1. Simulate three districts outside of Maricopa County

We target a Hispanic-majority district outside of Maricopa County (HVAP 50-55%). We avoid splitting municipalities in this region.

2. Simulate six more districts in the remainder of the map

We target 1 Hispanic-majority district in Maricopa County (HVAP 50-55%).

To balance county and municipality splits, we create pseudocounties for use in the county constraint. These are counties outside Maricopa County and Pima County, which are larger than a congressional district in population. Within Maricopa County and Pima County, municipalities are each their own pseudocounty as well. Overall, this approach leads to much fewer county and municipality splits than using either a county or county/municipality constraint.

Validation

validation_20221227_1452

SMC: 5,000 sampled plans of 9 districts on 2,224 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0.05

Plan diversity 80% range: 0.55 to 0.84

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white 
      1.007131       1.011104       1.006465       1.004757       1.019307       1.019665 
     pop_black       pop_hisp       pop_aian      pop_asian       pop_nhpi      pop_other 
      1.039325       1.021981       1.021785       1.027378       1.034513       1.035271 
       pop_two      vap_white      vap_black       vap_hisp       vap_aian      vap_asian 
      1.015950       1.031346       1.038971       1.023286       1.021316       1.028698 
      vap_nhpi      vap_other        vap_two pre_16_rep_tru pre_16_dem_cli pre_20_dem_bid 
      1.032422       1.033549       1.006353       1.028330       1.008934       1.017614 
pre_20_rep_tru uss_16_rep_mcc uss_16_dem_kir uss_18_rep_mcs uss_18_dem_sin uss_20_dem_kel 
      1.010727       1.032049       1.009605       1.022904       1.008203       1.017983 
uss_20_rep_mcs gov_18_rep_duc gov_18_dem_gar atg_18_rep_brn atg_18_dem_con sos_18_rep_gay 
      1.011414       1.014537       1.003847       1.015785       1.006302       1.026719 
sos_18_dem_hob         adv_16         adv_18         adv_20         arv_16         arv_18 
      1.007230       1.003324       1.006924       1.017580       1.021364       1.021690 
        arv_20  county_splits    muni_splits            ndv            nrv        ndshare 
      1.010465       1.001663       1.014508       1.010047       1.021033       1.012117 
         e_dvs         pr_dem          e_dem          pbias           egap 
      1.016499       1.006711       1.007349       1.006813       1.011905 

Sampling diagnostics for SMC run 1 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,914 (61.4%)     10.2%        0.45 3,916 ( 77%)      4 
Split 2     4,303 (53.8%)     19.6%        0.88 4,088 ( 81%)      3 
Split 3     3,904 (48.8%)     23.8%        0.99 3,871 ( 77%)      2 
Split 4     4,190 (52.4%)     18.2%        1.01 3,710 ( 73%)      2 
Split 5     3,432 (42.9%)      6.7%        0.94 3,308 ( 65%)      2 
Resample    1,109 (13.9%)       NA%        2.82 3,152 ( 62%)     NA 

Sampling diagnostics for SMC run 2 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,930 (61.6%)      5.8%        0.45 3,892 ( 77%)      7 
Split 2     4,341 (54.3%)     15.1%        0.90 4,123 ( 82%)      4 
Split 3     4,255 (53.2%)     13.3%        0.98 3,875 ( 77%)      4 
Split 4     4,446 (55.6%)     13.2%        0.96 3,701 ( 73%)      3 
Split 5     3,785 (47.3%)      7.2%        0.94 3,334 ( 66%)      2 
Resample    1,310 (16.4%)       NA%        2.75 3,290 ( 65%)     NA 

Sampling diagnostics for SMC run 3 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,938 (61.7%)     13.0%        0.45 3,932 ( 78%)      3 
Split 2     4,347 (54.3%)     14.6%        0.87 4,087 ( 81%)      4 
Split 3     4,196 (52.5%)     17.5%        1.01 3,897 ( 77%)      3 
Split 4     4,453 (55.7%)     18.8%        0.97 3,723 ( 74%)      2 
Split 5     3,832 (47.9%)      6.8%        0.91 3,431 ( 68%)      2 
Resample    1,277 (16.0%)       NA%        2.89 3,357 ( 66%)     NA 

Sampling diagnostics for SMC run 4 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,964 (62.1%)     13.0%        0.44 3,945 ( 78%)      3 
Split 2     4,291 (53.6%)     15.0%        0.89 4,166 ( 82%)      4 
Split 3     3,984 (49.8%)     17.3%        1.01 3,918 ( 77%)      3 
Split 4     4,229 (52.9%)     18.4%        0.95 3,713 ( 73%)      2 
Split 5     3,556 (44.5%)      6.7%        0.91 3,378 ( 67%)      2 
Resample    1,194 (14.9%)       NA%        3.00 3,090 ( 61%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of
the log weights (more than 3 or so), and low numbers of unique plans. R-hat values for summary
statistics should be between 1 and 1.05.

Checklist

@CoryMcCartan

mzhao80 commented 1 year ago

Validation plots for additional constraints: Hispanic majority-minority districts: Aimed for two (the number under the enacted plan.) Only achieves two on some of the simulated plans; finding quite a bit of difficulty getting any further. az_hvap

Partisanship by HVAP az_partisan

Competitiveness: Secondary constraint only to be considered after all others. az_competitive

CoryMcCartan commented 1 year ago

Analysis looks great.

Tagging @christopherkenny to see about VRA performance—cf. #91 for the 2020 comparison, which does look a tad better

christopherkenny commented 1 year ago

This might be sufficient, as there seems to be some shift between 7 and 8. Can we get a count here?

mzhao80 commented 1 year ago

New Run

Redistricting requirements

In Arizona, districts must:

  1. be contiguous
  2. have equal populations
  3. be geographically compact
  4. preserve county and municipality boundaries as much as possible
  5. favor competitive districts to the extent practicable

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%. We add a county/municipality constraint, as described below. We add a hinge Gibbs constraint targeting two majority-HVAP districts, one within Maricopa County and one outside of it (as exist in the enacted plan.) However, not all plans are guaranteed to have two majority-HVAP districts.

Data Sources

Data for Arizona comes from the ALARM Project's Redistricting Data Files.

Pre-processing Notes

No manual pre-processing decisions were necessary.

Simulation Notes

We sample 32,000 districting plans for Arizona across four independent runs of the SMC algorithm, and then thin the sample to down to 5,000 plans. To satisfy the Voting Rights Act constraint, we run the simulation in two steps.

1. Simulate three districts outside of Maricopa County

We target a Hispanic-majority district outside of Maricopa County (HVAP 50-55%). We avoid splitting municipalities in this region.

2. Simulate six more districts in the remainder of the map

We target 1 Hispanic-majority district in Maricopa County (HVAP 50-55%), and only keep plans where the district with the second-highest HVAP exceeds 30% (including the districts outside Maricopa County).

To balance county and municipality splits, we create pseudocounties for use in the county constraint. These are counties outside Maricopa County and Pima County, which are larger than a congressional district in population. Within Maricopa County and Pima County, municipalities are each their own pseudocounty as well. Overall, this approach leads to much fewer county and municipality splits than using either a county or county/municipality constraint.

Validation

validation_20230106_2238

SMC: 5,000 sampled plans of 9 districts on 2,224 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0.05

Plan diversity 80% range: 0.58 to 0.86

R-hat values for summary statistics:
   second_hisp    pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black 
      1.006106       1.004245       1.005938       1.009820       1.007458       1.008915       1.011136       1.026487 
      pop_hisp       pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white      vap_black 
      1.034838       1.003699       1.008666       1.015902       1.014525       1.004198       1.011367       1.031177 
      vap_hisp       vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_rep_tru pre_16_dem_cli 
      1.036375       1.003935       1.008228       1.013280       1.010103       1.009179       1.008768       1.009582 
pre_20_dem_bid pre_20_rep_tru uss_16_rep_mcc uss_16_dem_kir uss_18_rep_mcs uss_18_dem_sin uss_20_dem_kel uss_20_rep_mcs 
      1.003445       1.005762       1.009399       1.017345       1.009997       1.006318       1.003671       1.009825 
gov_18_rep_duc gov_18_dem_gar atg_18_rep_brn atg_18_dem_con sos_18_rep_gay sos_18_dem_hob         adv_16         adv_18 
      1.010000       1.011501       1.011676       1.009417       1.012213       1.007467       1.014971       1.008400 
        adv_20         arv_16         arv_18         arv_20  county_splits    muni_splits            ndv            nrv 
      1.003630       1.009062       1.012291       1.006772       1.003734       1.004238       1.008126       1.008690 
       ndshare          e_dvs         pr_dem          e_dem          pbias           egap 
      1.002197       1.002333       1.001297       1.007189       1.015744       1.005142 

Sampling diagnostics for SMC run 1 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     5,498 (68.7%)     21.4%        0.46 3,734 ( 74%)      1 
Split 2     5,045 (63.1%)     35.0%        0.82 4,416 ( 87%)      1 
Split 3     4,702 (58.8%)     17.0%        0.91 4,194 ( 83%)      3 
Split 4     4,907 (61.3%)     13.3%        0.90 4,032 ( 80%)      3 
Split 5     4,253 (53.2%)      4.5%        0.85 3,650 ( 72%)      3 
Resample    1,656 (20.7%)       NA%        2.17 3,477 ( 69%)     NA 

Sampling diagnostics for SMC run 2 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     5,496 (68.7%)     21.3%        0.45 3,763 ( 74%)      1 
Split 2     4,910 (61.4%)     35.7%        0.82 4,389 ( 87%)      1 
Split 3     4,881 (61.0%)     17.0%        0.92 4,151 ( 82%)      3 
Split 4     5,174 (64.7%)     18.5%        0.86 4,097 ( 81%)      2 
Split 5     4,315 (53.9%)      6.5%        0.83 3,699 ( 73%)      2 
Resample    1,518 (19.0%)       NA%        2.14 3,629 ( 72%)     NA 

Sampling diagnostics for SMC run 3 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     5,560 (69.5%)     21.1%        0.46 3,707 ( 73%)      1 
Split 2     5,134 (64.2%)     18.9%        0.79 4,333 ( 86%)      3 
Split 3     4,928 (61.6%)     23.5%        0.89 4,191 ( 83%)      2 
Split 4     4,634 (57.9%)     10.1%        0.88 4,075 ( 81%)      4 
Split 5     4,180 (52.3%)      4.8%        0.92 3,665 ( 72%)      3 
Resample    1,238 (15.5%)       NA%        2.17 3,416 ( 68%)     NA 

Sampling diagnostics for SMC run 4 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     5,523 (69.0%)     11.8%        0.45 3,731 ( 74%)      3 
Split 2     4,979 (62.2%)     26.3%        0.81 4,335 ( 86%)      2 
Split 3     4,760 (59.5%)     17.3%        0.94 4,174 ( 83%)      3 
Split 4     4,977 (62.2%)     13.3%        0.89 4,048 ( 80%)      3 
Split 5     3,770 (47.1%)      6.5%        0.86 3,681 ( 73%)      2 
Resample    1,057 (13.2%)       NA%        2.14 3,283 ( 65%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.

Checklist

@CoryMcCartan @christopherkenny

mzhao80 commented 1 year ago

Validation plots for additional constraints: Hispanic majority-minority districts: Aimed for two (the number under the enacted plan.) Only achieves two on some of the simulated plans. az_hisp

Two-party vote by HVAP: az_party The districts with the highest HVAP clearly perform in electing. In the districts with the second-highest HVAP, all had greater than 30% HVAP (by construction) and all had a greater than 30% Democratic share of the two-party vote. 74.9% saw the Democratic share of the vote exceed the Republican share of the vote.

Competitiveness: Secondary constraint only to be considered after all others. az_comp

CoryMcCartan commented 1 year ago

Looks better! @christopherkenny what do you think?

christopherkenny commented 1 year ago

Looks good to me!