2010 Pennsylvania Congressional Districts

taransamarth commented 1 year ago

Redistricting requirements

In Pennsylvania, districts must generally:

be contiguous
have equal populations
be geographically compact
preserve county and municipality boundaries as much as possible

We use a (pseudo-)county constraint to preserve boundaries as much as possible.

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%.

Data Sources

Data for Pennsylvania comes from the ALARM Project's 2010 Redistricting Data Files.

Pre-processing Notes

No manual pre-processing decisions were necessary.

Simulation Notes

We sample 10,000 districting plans for Pennsylvania over two independent runs of the SMC algorithm, and thin the total 20,000 plans down to 5,000. Pseudo-counties for the county constraint are generated for Allegheny, Montgomery, and Philadelphia counties, as they have more residents than a district's population.

No special techniques were needed to produce the sample.

Validation (20,000 plans)

SMC: 20,000 sampled plans of 18 districts on 9,256 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.66 to 0.84

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black 
      1.021009       1.017837       1.013367       1.021607       1.001710       1.016175       1.015101 
      pop_hisp       pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white 
      1.007263       1.015778       1.017111       1.011203       1.028106       1.016832       1.006714 
     vap_black       vap_hisp       vap_aian      vap_asian       vap_nhpi      vap_other        vap_two 
      1.014541       1.007130       1.023924       1.020345       1.012188       1.018049       1.019070 
pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid pre_20_rep_tru uss_16_dem_mcg uss_16_rep_too uss_18_dem_cas 
      1.033125       1.032476       1.014061       1.034240       1.033965       1.029695       1.021461 
uss_18_rep_bar gov_18_dem_wol gov_18_rep_wag atg_16_dem_sha atg_16_rep_raf atg_20_dem_sha atg_20_rep_hei 
      1.029627       1.020546       1.031483       1.035131       1.030263       1.022408       1.033408 
        adv_16         adv_18         adv_20         arv_16         arv_18         arv_20  county_splits 
      1.034455       1.021596       1.017767       1.034523       1.030536       1.031698       1.003324 
   muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem          e_dem 
      1.005742       1.025171       1.033710       1.032816       1.032815       0.999958       1.010924 
         pbias           egap 
      1.001807       1.013579 

Sampling diagnostics for SMC run 1 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     9,652 (96.5%)     15.9%        0.38 6,378 (101%)     15 
Split 2     9,562 (95.6%)     20.3%        0.42 6,282 ( 99%)     11 
Split 3     9,541 (95.4%)     29.3%        0.47 6,194 ( 98%)      7 
Split 4     9,459 (94.6%)     31.6%        0.50 6,177 ( 98%)      6 
Split 5     9,375 (93.7%)     34.4%        0.52 6,264 ( 99%)      5 
Split 6     9,303 (93.0%)     32.1%        0.56 6,186 ( 98%)      5 
Split 7     9,227 (92.3%)     35.6%        0.58 6,187 ( 98%)      4 
Split 8     9,177 (91.8%)     28.0%        0.61 6,117 ( 97%)      5 
Split 9     9,111 (91.1%)     26.1%        0.62 6,119 ( 97%)      5 
Split 10    9,108 (91.1%)     15.3%        0.63 6,092 ( 96%)      8 
Split 11    9,094 (90.9%)     21.9%        0.62 6,040 ( 96%)      5 
Split 12    9,007 (90.1%)     30.1%        0.63 6,090 ( 96%)      3 
Split 13    9,046 (90.5%)     21.9%        0.61 6,018 ( 95%)      4 
Split 14    8,792 (87.9%)     24.4%        0.61 5,900 ( 93%)      3 
Split 15    8,731 (87.3%)     16.4%        0.65 5,776 ( 91%)      4 
Split 16    8,717 (87.2%)     15.9%        0.68 5,613 ( 89%)      3 
Split 17    8,840 (88.4%)      4.4%        0.66 5,161 ( 82%)      4 
Resample    5,926 (59.3%)       NA%        0.68 5,526 ( 87%)     NA 

Sampling diagnostics for SMC run 2 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     9,655 (96.6%)     23.5%        0.38 6,307 (100%)     10 
Split 2     9,561 (95.6%)     31.0%        0.42 6,282 ( 99%)      7 
Split 3     9,548 (95.5%)     37.8%        0.46 6,174 ( 98%)      5 
Split 4     9,475 (94.7%)     50.1%        0.49 6,275 ( 99%)      3 
Split 5     9,395 (94.0%)     56.0%        0.52 6,191 ( 98%)      2 
Split 6     9,299 (93.0%)     37.4%        0.56 6,158 ( 97%)      4 
Split 7     9,240 (92.4%)     22.6%        0.58 6,168 ( 98%)      7 
Split 8     9,149 (91.5%)     28.3%        0.61 6,149 ( 97%)      5 
Split 9     9,077 (90.8%)     37.3%        0.63 6,172 ( 98%)      3 
Split 10    9,031 (90.3%)     34.3%        0.65 6,059 ( 96%)      3 
Split 11    9,049 (90.5%)     39.7%        0.65 6,130 ( 97%)      2 
Split 12    9,076 (90.8%)     19.5%        0.62 6,006 ( 95%)      5 
Split 13    9,034 (90.3%)     26.9%        0.61 6,001 ( 95%)      3 
Split 14    8,827 (88.3%)     24.9%        0.60 5,919 ( 94%)      3 
Split 15    8,768 (87.7%)     20.9%        0.63 5,759 ( 91%)      3 
Split 16    8,707 (87.1%)     10.3%        0.68 5,572 ( 88%)      5 
Split 17    8,869 (88.7%)      4.5%        0.65 5,042 ( 80%)      4 
Resample    5,699 (57.0%)       NA%        0.66 5,498 ( 87%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights
(more than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and
1.05.

Validation (thinned 5,000 plans)

SMC: 5,000 sampled plans of 18 districts on 9,256 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.67 to 0.84

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black 
      1.020579       1.018028       1.011372       1.024139       1.000993       1.009577       1.008770 
      pop_hisp       pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white 
      1.009923       1.009268       1.011630       1.012681       1.031385       1.016923       1.004650 
     vap_black       vap_hisp       vap_aian      vap_asian       vap_nhpi      vap_other        vap_two 
      1.009540       1.009878       1.020659       1.015338       1.015914       1.019375       1.019407 
pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid pre_20_rep_tru uss_16_dem_mcg uss_16_rep_too uss_18_dem_cas 
      1.031994       1.029202       1.014978       1.030866       1.032122       1.025392       1.016049 
uss_18_rep_bar gov_18_dem_wol gov_18_rep_wag atg_16_dem_sha atg_16_rep_raf atg_20_dem_sha atg_20_rep_hei 
      1.024714       1.017056       1.025985       1.033365       1.026334       1.022958       1.026299 
        adv_16         adv_18         adv_20         arv_16         arv_18         arv_20  county_splits 
      1.033128       1.016565       1.020580       1.030643       1.025319       1.027984       1.004561 
   muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem          e_dem 
      1.002368       1.024267       1.029059       1.024440       1.025581       1.000088       1.013872 
         pbias           egap 
      1.002418       1.015699 

Sampling diagnostics for SMC run 1 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     9,652 (96.5%)     15.9%        0.38 6,378 (101%)     15 
Split 2     9,562 (95.6%)     20.3%        0.42 6,282 ( 99%)     11 
Split 3     9,541 (95.4%)     29.3%        0.47 6,194 ( 98%)      7 
Split 4     9,459 (94.6%)     31.6%        0.50 6,177 ( 98%)      6 
Split 5     9,375 (93.7%)     34.4%        0.52 6,264 ( 99%)      5 
Split 6     9,303 (93.0%)     32.1%        0.56 6,186 ( 98%)      5 
Split 7     9,227 (92.3%)     35.6%        0.58 6,187 ( 98%)      4 
Split 8     9,177 (91.8%)     28.0%        0.61 6,117 ( 97%)      5 
Split 9     9,111 (91.1%)     26.1%        0.62 6,119 ( 97%)      5 
Split 10    9,108 (91.1%)     15.3%        0.63 6,092 ( 96%)      8 
Split 11    9,094 (90.9%)     21.9%        0.62 6,040 ( 96%)      5 
Split 12    9,007 (90.1%)     30.1%        0.63 6,090 ( 96%)      3 
Split 13    9,046 (90.5%)     21.9%        0.61 6,018 ( 95%)      4 
Split 14    8,792 (87.9%)     24.4%        0.61 5,900 ( 93%)      3 
Split 15    8,731 (87.3%)     16.4%        0.65 5,776 ( 91%)      4 
Split 16    8,717 (87.2%)     15.9%        0.68 5,613 ( 89%)      3 
Split 17    8,840 (88.4%)      4.4%        0.66 5,161 ( 82%)      4 
Resample    5,926 (59.3%)       NA%        0.68 5,526 ( 87%)     NA 

Sampling diagnostics for SMC run 2 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     9,655 (96.6%)     23.5%        0.38 6,307 (100%)     10 
Split 2     9,561 (95.6%)     31.0%        0.42 6,282 ( 99%)      7 
Split 3     9,548 (95.5%)     37.8%        0.46 6,174 ( 98%)      5 
Split 4     9,475 (94.7%)     50.1%        0.49 6,275 ( 99%)      3 
Split 5     9,395 (94.0%)     56.0%        0.52 6,191 ( 98%)      2 
Split 6     9,299 (93.0%)     37.4%        0.56 6,158 ( 97%)      4 
Split 7     9,240 (92.4%)     22.6%        0.58 6,168 ( 98%)      7 
Split 8     9,149 (91.5%)     28.3%        0.61 6,149 ( 97%)      5 
Split 9     9,077 (90.8%)     37.3%        0.63 6,172 ( 98%)      3 
Split 10    9,031 (90.3%)     34.3%        0.65 6,059 ( 96%)      3 
Split 11    9,049 (90.5%)     39.7%        0.65 6,130 ( 97%)      2 
Split 12    9,076 (90.8%)     19.5%        0.62 6,006 ( 95%)      5 
Split 13    9,034 (90.3%)     26.9%        0.61 6,001 ( 95%)      3 
Split 14    8,827 (88.3%)     24.9%        0.60 5,919 ( 94%)      3 
Split 15    8,768 (87.7%)     20.9%        0.63 5,759 ( 91%)      3 
Split 16    8,707 (87.1%)     10.3%        0.68 5,572 ( 88%)      5 
Split 17    8,869 (88.7%)      4.5%        0.65 5,042 ( 80%)      4 
Resample    5,699 (57.0%)       NA%        0.66 5,498 ( 87%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights
(more than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and
1.05.

Checklist

[X] I have followed the instructions
[X] I have updated the tracker
[X] All TODO lines from the template code have been removed
[X] I have merged in the master branch and then recalculated summary statistics
[X] I have run enforce_style() to format my code
[X] The documentation copied above is up-to-date
[X] There are no data files in this pull request
[X] None of the file output paths (for the redist_map and redist_plans objects, and summary statistics) have been edited

@christopherkenny

taransamarth commented 1 year ago

Flagging that the 2010 reference plan used here is the plan that was in force from 2011 to 2018 + struck down by the PA high court -- should that be updated to the 2018-21 plan?

kosukeimai commented 1 year ago

I think we can use the 2010 reference plan since our goal is to see how the plan changed between the last two decades. But, it would also be good to include both plans so that we can readily compare if needed to be.

christopherkenny commented 1 year ago

The simulations and code look great. One small thing: can you remove the if(interactive()) ... in the 3rd file?

I wouldn't worry about adding the other plan. We have other states with similar problems, which we are going to deal with automatically later. We have BAFs for all old plans, which makes that easy to do at the analysis stage.

alarm-redist / fifty-states