Re-run 2020 Arizona Congressional Districts

CoryMcCartan commented 2 years ago

Redistricting requirements

In Arizona, districts must, under the state constitution:

be contiguous
have equal populations
be geographically compact
preserve county and municipality boundaries as much as possible
favor competitive districts to the extent practicable

Interpretation of requirements

We enforce a maximum population deviation of 0.5%. We add a county/municipality constraint, as described below. We add a VRA constraint targeting two majority-HVAP districts which are also substantially majority-minority. Not every plans is guaranteed to have two majority-HVAP districts, however.

Data Sources

Data for Arizona comes from the ALARM Project's 2020 Redistricting Data Files.

Pre-processing Notes

No manual pre-processing decisions were necessary.

Simulation Notes

We sample 32,000 districting plans for Arizona across four independent runs of the SMC algorithm, and then thin the sample to down to 5,000 plans. To satisfy the Voting Rights Act constraint, we run the simulation in two steps.

1. Simulate three districts outside of Maricopa County

We target a Hispanic-majority district outside of Maricopa county (HVAP 53-58%). However, most realized districts, while electing Democratic candidates, have a lower HVAP. We avoid splitting municipalities in this region.

2. Simulate six more districts in the remainder of the map

We target 1 Hispanic-majority district in Maricopa county (HVAP 53-58%). We are able to realize this target values. To balance county and municipality splits, we create pseudocounties for use in the county constraint. These are counties outside Maricopa County and Pima County, which are larger than a congressional district in population. Within Maricopa County and Pima County, municipalities are each their own pseudocounty as well. Overall, this approach leads to much fewer county and municipality splits than using either a county or county/municipality constraint.

Validation

SMC: 5,000 sampled plans of 9 districts on 1,538 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.9
`est_label_mult`=1 • `pop_temper`=0.03

Plan diversity 80% range: 0.48 to 0.83

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp      pop_white      pop_black       pop_aian      pop_asian 
        1.0115         1.0143         1.0039         1.0060         1.0145         1.0089         1.0043         1.0080         1.0100         1.0056 
      pop_nhpi      pop_other        pop_two       vap_hisp      vap_white      vap_black       vap_aian      vap_asian       vap_nhpi      vap_other 
        1.0042         1.0052         1.0026         1.0128         1.0070         1.0087         1.0103         1.0053         1.0028         1.0082 
       vap_two pre_16_rep_tru pre_16_dem_cli uss_16_rep_mcc uss_16_dem_kir uss_18_rep_mcs uss_18_dem_sin gov_18_rep_duc gov_18_dem_gar atg_18_rep_brn 
        1.0044         1.0074         1.0115         1.0084         1.0283         1.0079         1.0085         1.0077         1.0079         1.0077 
atg_18_dem_con sos_18_rep_gay sos_18_dem_hob pre_20_dem_bid pre_20_rep_tru uss_20_dem_kel uss_20_rep_mcs         arv_16         adv_16         arv_18 
        1.0120         1.0078         1.0126         1.0086         1.0075         1.0080         1.0077         1.0083         1.0201         1.0078 
        adv_18         arv_20         adv_20  county_splits    muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem 
        1.0094         1.0076         1.0082         1.0121         1.0095         1.0093         1.0080         1.0145         1.0144         1.0126 
         e_dem          pbias           egap 
        1.0075         1.0152         1.0048 

Sampling diagnostics for SMC run 1 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,453 (55.7%)      6.9%        0.31 3,968 ( 78%)      5 
Split 2     3,907 (48.8%)     11.7%        0.40 4,026 ( 80%)      4 
Split 3     3,290 (41.1%)     10.0%        0.44 3,805 ( 75%)      4 
Split 4     2,207 (27.6%)     10.1%        0.48 3,385 ( 67%)      3 
Split 5       924 (11.5%)      4.8%        0.50 2,727 ( 54%)      2 
Resample       658 (8.2%)       NA%        3.02 1,943 ( 38%)     NA 

Sampling diagnostics for SMC run 2 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k    
Split 1     4,473 (55.9%)      6.8%        0.31 3,920 ( 78%)      5    
Split 2     3,943 (49.3%)     11.6%        0.40 3,963 ( 78%)      4    
Split 3     3,324 (41.5%)     13.5%        0.45 3,811 ( 75%)      3    
Split 4     2,277 (28.5%)     14.9%        0.46 3,398 ( 67%)      2    
Split 5        560 (7.0%)      3.2%        0.48 2,759 ( 55%)      3    
Resample       386 (4.8%)       NA%        3.02 1,894 ( 37%)     NA  * 

Sampling diagnostics for SMC run 3 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,462 (55.8%)      5.7%        0.31 3,971 ( 79%)      6 
Split 2     3,935 (49.2%)     11.6%        0.40 4,015 ( 79%)      4 
Split 3     3,237 (40.5%)      9.7%        0.44 3,794 ( 75%)      4 
Split 4     2,234 (27.9%)     10.2%        0.47 3,304 ( 65%)      3 
Split 5       944 (11.8%)      4.9%        0.48 2,811 ( 56%)      2 
Resample       719 (9.0%)       NA%        3.13 2,124 ( 42%)     NA 

Sampling diagnostics for SMC run 4 of 4 (8,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,413 (55.2%)      6.9%        0.31 4,008 ( 79%)      5 
Split 2     3,933 (49.2%)      7.8%        0.40 4,014 ( 79%)      6 
Split 3     3,102 (38.8%)      8.0%        0.44 3,786 ( 75%)      5 
Split 4     2,392 (29.9%)      7.6%        0.48 3,368 ( 67%)      4 
Split 5     1,024 (12.8%)      3.2%        0.48 2,833 ( 56%)      3 
Resample       740 (9.2%)       NA%        3.03 2,140 ( 42%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more than 3 or so), and low
numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.
• (*) Bottlenecks found: Consider weakening or removing constraints, or increasing the population tolerance. If the accpetance rate drops quickly in
the final splits, try increasing `pop_temper` by 0.01. To visualize what geographic areas may be causing problems, try running the following code.
Highlighted areas are those that may be causing the bottleneck.
    plot(<map object>, rowMeans(as.matrix(plans) == <bottleneck iteration>))

Notes: The bottleneck exists marginally in only 1 of the 4 runs, and diversity and R-hats all look good.

Checklist

[x] I have followed the instructions
[x] I have updated the tracker
[x] All TODO lines from the template code have been removed
[x] I have merged in the master branch and then recalculated summary statistics
[x] I have run enforce_style() to format my code
[x] The documentation copied above is up-to-date
[x] There are no data files in this pull request
[x] None of the file output paths (for the redist_map and redist_plans objects, and summary statistics) have been edited

@christopherkenny

CoryMcCartan commented 2 years ago

Additional constraints:

Competitiveness (supposed to be a secondary constraint)

VRA

The two HVAP-targeted districts clearly separate from the rest of the pack in terms of ability to elect

christopherkenny commented 2 years ago

Two questions:

The competitiveness constraint looks like it's making things less competitive? Thoughts?
How worried should we be about a resampling bottleneck in 1/4 chains? The other diagnostics look fantastic.

CoryMcCartan commented 2 years ago

districts 5-8 are all closer to 50% on the left plot than the right, ergo more competitive
not worried if everything else looks good (it's an arbitrary threshold)

christopherkenny commented 2 years ago

Oh! I was reversing them. That looks good
Sounds good

alarm-redist / fifty-states