Re-run 2020 Hawaii Congressional Districts

christopherkenny commented 2 years ago

Redistricting requirements

In Hawaii, under HRS Title 1 S25, districts must:

be contiguous unless crossing islands (25-2 (b) (2))
be geographically compact (25-2(b)(3))
preserve tract boundaries as much as possible (25-2(b)(4))
not unduly favor any people or party (25-2(b)(6))
avoid mixing substantially different socioeconomic regions (25-2(b)(6))

Interpretation of requirements

We enforce a maximum population deviation of 0.5%. We use Census tracts are in accordance with (25-2(b)(4)). We use municipalities to attempt to follow (25-2(b)(6)) in absence of regional knowledge.

Data Sources

Data for Hawaii comes from the ALARM Project's 2020 Redistricting Data Files.

Pre-processing Notes

Islands are connecting in the adjacency graph, but this is not used for simulation purposes.

Simulation Notes

We sample 5,000 districting plans for Hawaii across 2 independent runs of the SMC algorithm. We use partial SMC to draw one district in the contiguous portion of Honolulu and assign the remainder to district 2. We use municipalities (or the county name if a tract is not assigned to a municipality) for the algorithmic constraint.

Validation

validation_20220622_0148

SMC: 5,000 sampled plans of 2 districts on 461 units
`adapt_k_thresh`=NULL • `seq_alpha`=NULL
`est_label_mult`=NULL • `pop_temper`=NULL

Plan diversity 80% range: 0.22 to 0.67

R-hat values for summary statistics:
     total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp      pop_white 
     1.0008174      1.0004894      0.9999982      1.0000380      1.0004853      1.0001852 
     pop_black       pop_aian      pop_asian       pop_nhpi      pop_other        pop_two 
     0.9999709      1.0001105      1.0000817      0.9998751      0.9998873      0.9999483 
      vap_hisp      vap_white      vap_black       vap_aian      vap_asian       vap_nhpi 
     1.0003370      1.0001420      0.9999537      1.0007178      1.0000410      0.9998735 
     vap_other        vap_two pre_16_dem_cli pre_16_rep_tru uss_16_dem_sch uss_16_rep_car 
     0.9999685      0.9999112      0.9998145      0.9998550      0.9998144      0.9998300 
uss_18_dem_hir uss_18_rep_cur gov_18_dem_ige gov_18_rep_tup pre_20_dem_bid pre_20_rep_tru 
     0.9998430      0.9999376      0.9998176      0.9999903      0.9998383      0.9998275 
        arv_16         adv_16         arv_18         adv_18         arv_20         adv_20 
     0.9998025      0.9998176      0.9999064      0.9998311      0.9998275      0.9998383 
   muni_splits            ndv            nrv        ndshare          e_dvs           egap 
     1.0000324      0.9998241      0.9998728      0.9998870      0.9998660      1.0000560 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large
std. devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat
values for summary statistics should be between 1 and 1.05.

Checklist

[x] I have followed the instructions
[x] I have updated the tracker
[x] All TODO lines from the template code have been removed
[x] I have merged in the master branch and then recalculated summary statistics
[x] I have run enforce_style() to format my code
[x] The documentation copied above is up-to-date
[x] There are no data files in this pull request
[x] None of the file output paths (for the redist_map and redist_plans objects, and summary statistics) have been edited

@CoryMcCartan

Note: Chain has to be assigned manually in 03_*.R to use summary(plans).

christopherkenny commented 2 years ago

summary(plans_honolulu) for reference

SMC: 5,000 sampled plans of 2 districts on 329 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.31 to 0.59

Sampling diagnostics for SMC run 1 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,468 (98.7%)      6.9%        0.23 1,603 (101%)      6 
Resample    2,377 (95.1%)       NA%        0.23 1,536 ( 97%)     NA 

Sampling diagnostics for SMC run 2 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,466 (98.7%)      7.1%        0.23 1,573 (100%)      6 
Resample    2,372 (94.9%)       NA%        0.23 1,564 ( 99%)     NA

christopherkenny commented 2 years ago

Carrying over diagnostics makes summary(plans) give:

SMC: 5,000 sampled plans of 2 districts on 461 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.25 to 0.66

R-hat values for summary statistics:
     total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp      pop_white      pop_black       pop_aian      pop_asian 
     1.0002868      1.0007373      1.0001933      0.9998783      0.9999952      1.0022343      1.0010607      1.0011158      1.0002533 
      pop_nhpi      pop_other        pop_two       vap_hisp      vap_white      vap_black       vap_aian      vap_asian       vap_nhpi 
     1.0001349      1.0012130      1.0000471      1.0002944      1.0016297      1.0015171      1.0012796      1.0001531      1.0006517 
     vap_other        vap_two pre_16_dem_cli pre_16_rep_tru uss_16_dem_sch uss_16_rep_car uss_18_dem_hir uss_18_rep_cur gov_18_dem_ige 
     1.0011878      1.0002517      0.9998244      0.9998939      0.9999832      1.0001158      0.9999012      1.0001883      0.9999943 
gov_18_rep_tup pre_20_dem_bid pre_20_rep_tru         arv_16         adv_16         arv_18         adv_18         arv_20         adv_20 
     1.0002951      0.9998419      0.9999805      0.9999493      0.9999208      1.0002241      0.9999463      0.9999805      0.9998419 
   muni_splits            ndv            nrv        ndshare          e_dvs           egap 
     0.9998809      0.9998904      1.0000989      0.9998427      0.9998674      1.0000434 

Sampling diagnostics for SMC run 1 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,468 (98.7%)      6.8%        0.23 1,559 ( 99%)      6 
Resample    2,380 (95.2%)       NA%        0.23 1,540 ( 97%)     NA 

Sampling diagnostics for SMC run 2 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,468 (98.7%)      6.8%        0.23 1,594 (101%)      6 
Resample    2,378 (95.1%)       NA%        0.23 1,554 ( 98%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more than 3 or so), and low
numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.

alarm-redist / fifty-states