alarm-redist / fifty-states

Redistricting analysis for all 50 U.S. states
https://alarm-redist.github.io/fifty-states/
Other
9 stars 7 forks source link

Re-run 2020 Ohio Congressional Districts #97

Closed CoryMcCartan closed 2 years ago

CoryMcCartan commented 2 years ago

Redistricting requirements

In Ohio, districts must, under Article XIX of the Ohio Constitution:

  1. be contiguous
  2. have equal populations
  3. be geographically compact
  4. not split Cincinnati or Cleveland
  5. minimize splitting of Columbus
  6. split no more than 18 counties once, and no more than 5 counties twice, and no counties three times
  7. additionally preserve county and municipality boundaries where possible

Interpretation of requirements

We enforce a maximum population deviation of 0.5%. We employ a variety of anti-split constraints, both in pre-processing and in simulation, as detailed below. Ohio also has one VRA district in Cuyahoga county.

Data Sources

Data for Ohio comes from the ALARM Project's 2020 Redistricting Data Files. Ohio has many precincts which are not geographically contiguous, especially in and around Franklin County (Columbus). We do not attempt to split or otherwise correct these precincts, which may lead some simulated districts to be geographically noncontiguous, despite being contiguous according to the precinct adjacency graph.

Pre-processing Notes

We merge the precincts in all counties which are not split by the enacted plan. We merge the cities of Cincinnati and Cleveland.

Simulation Notes

We sample 40,000 districting plans for Ohio across two runs of the SMC algorithm, then filter down to 5,000 total plans. We begin by sampling plans in Cuyahoga county to generate a VRA district with BVAP at least 40%. Then we sample the remaining districts. We apply a Gibbs constraint to discourage multiple splits (a penalty of 100.0 for 3 splits, and 3.0 for 2 splits) We apply a Gibbs constraint to discourage splitting Columbus (a penalty of 0.5 per splitting district) We use population tempering of 0.01 to encourage efficiency.

Validation

image

Extra validation: image

SMC: 5,000 sampled plans of 15 districts on 8,937 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.04

Plan diversity 80% range: 0.50 to 0.69

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp      pop_white      pop_black 
       1.01386        1.04624        1.10217        1.02453        1.01502        1.02831        1.05033        1.03351 
      pop_aian      pop_asian       pop_nhpi      pop_other        pop_two       vap_hisp      vap_white      vap_black 
       1.08934        1.00760        1.00196        1.05412        1.01050        1.03566        1.07376        1.03281 
      vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_rep_tru pre_16_dem_cli uss_16_rep_por 
       1.08288        1.00059        1.00261        1.01985        1.02981        1.04805        1.08137        1.05697 
uss_16_dem_str uss_18_rep_ren uss_18_dem_bro gov_18_rep_dew gov_18_dem_cor atg_18_rep_yos atg_18_dem_det sos_18_rep_lar 
       1.08008        1.05659        1.06499        1.05495        1.09719        1.05576        1.09826        1.05207 
sos_18_dem_cly pre_20_rep_tru pre_20_dem_bid         arv_16         adv_16         arv_18         adv_18         arv_20 
       1.09493        1.05571        1.02517        1.05657        1.09918        1.05377        1.09352        1.05571 
        adv_20  county_splits    muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem 
       1.02517        1.03256        1.00399        1.09501        1.05440        1.03986        1.03665        1.01368 
         e_dem          pbias           egap       splits_1       splits_2 
       0.99989        1.01206        1.00020        1.00347        1.00802 
✖ WARNING: SMC runs have not converged.

Sampling diagnostics for SMC run 1 of 2 (30,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k    
Split 1    22,039 (73.5%)     18.9%        0.76 18,868 ( 99%)     10    
Split 2    19,496 (65.0%)     20.5%        0.84 16,604 ( 88%)      7    
Split 3    17,009 (56.7%)     16.5%        0.90 16,242 ( 86%)      7    
Split 4    13,845 (46.1%)     13.9%        0.95 15,744 ( 83%)      7    
Split 5     9,806 (32.7%)     12.1%        0.99 15,299 ( 81%)      7    
Split 6     8,156 (27.2%)     17.5%        1.05 14,709 ( 78%)      4    
Split 7     6,705 (22.3%)     19.2%        1.10 14,144 ( 75%)      3    
Split 8     7,126 (23.8%)     17.2%        1.10 13,494 ( 71%)      3    
Split 9     6,051 (20.2%)     20.3%        1.13 13,128 ( 69%)      2    
Split 10    4,940 (16.5%)     19.1%        1.10 12,263 ( 65%)      2    
Split 11    6,043 (20.1%)     17.7%        1.12 11,082 ( 58%)      2    
Split 12    5,300 (17.7%)     17.8%        1.00  9,761 ( 51%)      2    
Split 13     1,782 (5.9%)      5.3%        0.85 10,534 ( 56%)      2    
Resample     1,429 (4.8%)       NA%        6.10  9,429 ( 50%)     NA  * 

Sampling diagnostics for SMC run 2 of 2 (30,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    22,038 (73.5%)     23.2%        0.75 18,875 (100%)      8 
Split 2    19,563 (65.2%)     27.3%        0.84 16,596 ( 88%)      5 
Split 3    15,832 (52.8%)     22.4%        0.90 16,240 ( 86%)      5 
Split 4    12,884 (42.9%)     27.9%        0.96 15,589 ( 82%)      3 
Split 5    12,038 (40.1%)     29.8%        1.01 15,195 ( 80%)      2 
Split 6    10,495 (35.0%)     17.0%        1.05 14,760 ( 78%)      4 
Split 7     9,023 (30.1%)     18.7%        1.08 14,336 ( 76%)      3 
Split 8     7,889 (26.3%)     21.5%        1.11 13,819 ( 73%)      2 
Split 9     3,538 (11.8%)     16.0%        1.13 13,169 ( 69%)      3 
Split 10    4,355 (14.5%)     18.3%        1.08 11,888 ( 63%)      2 
Split 11    4,089 (13.6%)     16.7%        1.09 11,044 ( 58%)      2 
Split 12    4,152 (13.8%)     13.9%        0.97 10,081 ( 53%)      2 
Split 13     1,879 (6.3%)      4.4%        0.94  9,883 ( 52%)      2 
Resample     2,010 (6.7%)       NA%        6.91  8,700 ( 46%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights
(more than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.
• SMC convergence: Increase the number of samples. If you are experiencing low plan diversity or bottlenecks as well,
address those issues first.
• (*) Bottlenecks found: Consider weakening or removing constraints, or increasing the population tolerance. If the
accpetance rate drops quickly in the final splits, try increasing `pop_temper` by 0.01. To visualize what geographic
areas may be causing problems, try running the following code. Highlighted areas are those that may be causing the
bottleneck.

NOTE: high partisan R-hats are unique to District 1, which is wedged inside Hamilton county. R-hats are well below 1.05 for other spot-checked districts. And given ≤ 1.05 for population totals, not concerned about a value of ~ 1.1 for pop_dev

Checklist

@christopherkenny @kosukeimai — check re: case.

CoryMcCartan commented 2 years ago

Example R-hats from district 3 (D-leaning, Columbus)

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp      pop_white      pop_black 
       1.01386        1.00213        1.10217        1.02453        1.00054        1.00316        1.01741        1.02974 
      pop_aian      pop_asian       pop_nhpi      pop_other        pop_two       vap_hisp      vap_white      vap_black 
       1.01408        1.01068        1.00771        1.00066        1.02087        1.00206        1.02455        1.03000 
      vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_rep_tru pre_16_dem_cli uss_16_rep_por 
       1.03385        1.00608        1.00719        1.00296        1.00027        0.99981        1.00151        1.00098 
uss_16_dem_str uss_18_rep_ren uss_18_dem_bro gov_18_rep_dew gov_18_dem_cor atg_18_rep_yos atg_18_dem_det sos_18_rep_lar 
       1.00170        1.00054        1.00479        1.00037        1.00360        1.00045        1.00280        1.00044 
sos_18_dem_cly pre_20_rep_tru pre_20_dem_bid         arv_16         adv_16         arv_18         adv_18         arv_20 
       1.00305        0.99991        1.00248        1.00042        1.00091        1.00044        1.00341        0.99991 
        adv_20  county_splits    muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem 
       1.00248        1.03256        1.00399        1.00232        1.00011        0.99983        0.99983        1.00004 
         e_dem          pbias           egap       splits_1       splits_2 
       0.99989        1.01206        1.00020        1.00347        1.00802 

and District 12 (R-leaning, eastern Ohio)

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp      pop_white      pop_black 
       1.01386        1.01071        1.10217        1.02453        1.01911        1.02902        1.01505        1.00975 
      pop_aian      pop_asian       pop_nhpi      pop_other        pop_two       vap_hisp      vap_white      vap_black 
       1.02541        1.02104        1.01608        1.01986        1.00405        1.03015        1.01558        1.01054 
      vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_rep_tru pre_16_dem_cli uss_16_rep_por 
       1.03664        1.02317        1.00964        1.01961        1.00356        1.00362        1.00496        1.01657 
uss_16_dem_str uss_18_rep_ren uss_18_dem_bro gov_18_rep_dew gov_18_dem_cor atg_18_rep_yos atg_18_dem_det sos_18_rep_lar 
       1.00161        1.02106        1.04049        1.01048        1.00423        1.01601        1.05720        1.01362 
sos_18_dem_cly pre_20_rep_tru pre_20_dem_bid         arv_16         adv_16         arv_18         adv_18         arv_20 
       1.04484        1.01143        1.01765        1.00087        1.00087        1.01548        1.03615        1.01143 
        adv_20  county_splits    muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem 
       1.01765        1.03256        1.00399        1.00785        1.00382        1.03176        1.01758        1.00345 
         e_dem          pbias           egap       splits_1       splits_2 
       0.99989        1.01206        1.00020        1.00347        1.00802 
christopherkenny commented 2 years ago

Looks good given the additional Rhats.