mzhao80 commented 1 year ago

2010 Nebraska Congressional Districts

Redistricting requirements

In Nebraska, districts must, under a legislative resolution:

be contiguous
have equal populations (specifically, within 0.5% of equality)
be geographically compact
preserve county and municipality boundaries as much as possible
preserve the cores of prior districts
not be drawn using partisan information

Interpretation of requirements

We enforce a maximum population deviation of 0.5%. We apply a county constraint. We preprocess the map to ensure the cores of prior districts are preserved, as described below.

Data Sources

Data for Nebraska comes from the ALARM Project's Redistricting Data Files.

Pre-processing Notes

To preserve the cores of prior districts, we merge all precincts which are more than two precincts away from a district border under the 2000 plan.

Simulation Notes

We sample 5,000 districting plans for Nebraska. In addition to a county constraint applied to the residual counties left over from the cores operation, we apply an additional Gibbs constraint of strength 1.5 to avoid splitting counties.

Validation

validation_20220925_1753

      draw       district   total_pop       pop_overlap       total_vap         plan_dev        
 cd_2010:    3   1:5001   Min.   :605738   Min.   :0.7852   Min.   :444519   Min.   :0.0001966  
 1      :    3   2:5001   1st Qu.:607537   1st Qu.:0.8490   1st Qu.:449199   1st Qu.:0.0023928  
 2      :    3   3:5001   Median :608803   Median :0.8809   Median :458254   Median :0.0031763  
 3      :    3            Mean   :608780   Mean   :0.8861   Mean   :455707   Mean   :0.0031807  
 4      :    3            3rd Qu.:609994   3rd Qu.:0.9197   3rd Qu.:460055   3rd Qu.:0.0041548  
 5      :    3            Max.   :611822   Max.   :1.0000   Max.   :464016   Max.   :0.0049974  
 (Other):14985                                                                                  
   comp_edge       comp_polsby       pop_white        pop_black        pop_hisp        pop_aian   
 Min.   :0.9785   Min.   :0.1249   Min.   :440919   Min.   : 3814   Min.   :32455   Min.   :2650  
 1st Qu.:0.9842   1st Qu.:0.2551   1st Qu.:454153   1st Qu.: 4643   1st Qu.:45593   1st Qu.:2936  
 Median :0.9858   Median :0.2935   Median :520805   Median :14888   Median :59822   Median :4738  
 Mean   :0.9857   Mean   :0.3450   Mean   :499918   Mean   :26986   Mean   :55802   Mean   :4932  
 3rd Qu.:0.9872   3rd Qu.:0.4318   3rd Qu.:528729   3rd Qu.:61221   3rd Qu.:62690   3rd Qu.:6911  
 Max.   :0.9897   Max.   :0.6163   Max.   :537983   Max.   :64139   Max.   :72732   Max.   :9021  

   pop_asian        pop_nhpi     pop_other         pop_two        vap_white        vap_black    
 Min.   : 3399   Min.   :216   Min.   : 369.0   Min.   : 5137   Min.   :343416   Min.   : 2557  
 1st Qu.: 3964   1st Qu.:253   1st Qu.: 435.0   1st Qu.: 5417   1st Qu.:351148   1st Qu.: 3126  
 Median :12980   Median :335   Median : 648.0   Median :10534   Median :404581   Median :10424  
 Mean   :10640   Mean   :322   Mean   : 705.3   Mean   : 9475   Mean   :388985   Mean   :18327  
 3rd Qu.:15078   3rd Qu.:383   3rd Qu.:1035.0   3rd Qu.:12280   3rd Qu.:412224   3rd Qu.:41264  
 Max.   :15951   Max.   :465   Max.   :1113.0   Max.   :13490   Max.   :420260   Max.   :43525  

    vap_hisp        vap_aian      vap_asian        vap_nhpi       vap_other        vap_two    
 Min.   :19073   Min.   :1936   Min.   : 2464   Min.   :158.0   Min.   :192.0   Min.   :2440  
 1st Qu.:26782   1st Qu.:2078   1st Qu.: 2885   1st Qu.:189.0   1st Qu.:224.0   1st Qu.:2592  
 Median :34811   Median :3092   Median : 9550   Median :235.0   Median :345.0   Median :4391  
 Mean   :32740   Mean   :3212   Mean   : 7806   Mean   :228.7   Mean   :348.7   Mean   :4059  
 3rd Qu.:36788   3rd Qu.:4342   3rd Qu.:10972   3rd Qu.:265.0   3rd Qu.:477.0   3rd Qu.:5154  
 Max.   :42620   Max.   :5495   Max.   :11657   Max.   :318.0   Max.   :516.0   Max.   :5657  

 pre_16_rep_tru   pre_16_dem_cli   pre_20_rep_tru   pre_20_dem_bid   uss_18_rep_fis   uss_18_dem_ray  
 Min.   :126077   Min.   : 50323   Min.   :135994   Min.   : 62771   Min.   :109190   Min.   : 48476  
 1st Qu.:137290   1st Qu.: 53396   1st Qu.:153887   1st Qu.: 65977   1st Qu.:120548   1st Qu.: 52412  
 Median :159103   Median :103274   Median :183435   Median :137731   Median :130553   Median :101709  
 Mean   :165321   Mean   : 94834   Mean   :185609   Mean   :124874   Mean   :134383   Mean   : 89974  
 3rd Qu.:201888   3rd Qu.:127525   3rd Qu.:224140   3rd Qu.:168716   3rd Qu.:154993   3rd Qu.:115152  
 Max.   :207666   Max.   :131572   Max.   :230918   Max.   :177388   Max.   :159582   Max.   :119701  

 uss_20_rep_sas   uss_20_dem_jan   gov_18_rep_ric   gov_18_dem_kri   atg_18_rep_pet   sos_18_rep_evn  
 Min.   :155202   Min.   : 43151   Min.   :111025   Min.   : 53041   Min.   :155879   Min.   :108868  
 1st Qu.:175216   1st Qu.: 44968   1st Qu.:122300   1st Qu.: 56903   1st Qu.:169087   1st Qu.:119890  
 Median :198186   Median : 82858   Median :134116   Median :107491   Median :173444   Median :135605  
 Mean   :194486   Mean   : 75748   Mean   :137278   Mean   : 95402   Mean   :172249   Mean   :135539  
 3rd Qu.:216476   3rd Qu.: 98541   3rd Qu.:158664   3rd Qu.:121343   3rd Qu.:176880   3rd Qu.:154305  
 Max.   :223103   Max.   :102331   Max.   :163464   Max.   :126000   Max.   :184787   Max.   :159000  

 sos_18_dem_dan       adv_16           adv_18           adv_20           arv_16           arv_18      
 Min.   : 47202   Min.   : 50323   Min.   : 49546   Min.   : 52754   Min.   :126077   Min.   :121739  
 1st Qu.: 50335   1st Qu.: 53396   1st Qu.: 53202   1st Qu.: 55412   1st Qu.:137290   1st Qu.:133524  
 Median : 96553   Median :103274   Median :101903   Median :110161   Median :159103   Median :143783  
 Mean   : 88017   Mean   : 94834   Mean   : 91113   Mean   :100219   Mean   :165321   Mean   :144849  
 3rd Qu.:116809   3rd Qu.:127525   3rd Qu.:117764   3rd Qu.:133623   3rd Qu.:201888   3rd Qu.:160667  
 Max.   :120940   Max.   :131572   Max.   :122212   Max.   :139857   Max.   :207666   Max.   :165301  

     arv_20       county_splits   muni_splits         ndv              nrv            ndshare      
 Min.   :145690   Min.   :1.00   Min.   :0.000   Min.   : 50866   Min.   :129222   Min.   :0.2144  
 1st Qu.:164631   1st Qu.:2.00   1st Qu.:1.000   1st Qu.: 53973   1st Qu.:142897   1st Qu.:0.2264  
 Median :190816   Median :3.00   Median :1.000   Median :104948   Median :159473   Median :0.3964  
 Mean   :189953   Mean   :3.11   Mean   :1.475   Mean   : 94757   Mean   :160716   Mean   :0.3659  
 3rd Qu.:220113   3rd Qu.:4.00   3rd Qu.:2.000   3rd Qu.:124669   3rd Qu.:183662   3rd Qu.:0.4734  
 Max.   :226787   Max.   :8.00   Max.   :6.000   Max.   :129641   Max.   :189048   Max.   :0.4887  

     e_dvs            pr_dem           e_dem            pbias              egap         
 Min.   :0.2158   Min.   :0.0000   Min.   :0.1667   Min.   :-0.1667   Min.   :-0.02761  
 1st Qu.:0.2283   1st Qu.:0.0000   1st Qu.:0.5000   1st Qu.:-0.1667   1st Qu.: 0.02573  
 Median :0.4018   Median :0.0000   Median :0.5000   Median :-0.1667   Median : 0.07150  
 Mean   :0.3711   Mean   :0.1778   Mean   :0.5335   Mean   :-0.1667   Mean   : 0.06647  
 3rd Qu.:0.4814   3rd Qu.:0.5000   3rd Qu.:0.6667   3rd Qu.:-0.1667   3rd Qu.: 0.08916  
 Max.   :0.4973   Max.   :0.8333   Max.   :0.8333   Max.   :-0.1667   Max.   : 0.19687

Checklist

[x] I have followed the instructions
[x] I have updated the tracker
[x] All TODO lines from the template code have been removed
[x] I have merged in the master branch and then recalculated summary statistics
[x] I have run enforce_style() to format my code
[x] The documentation copied above is up-to-date
[x] There are no data files in this pull request
[x] None of the file output paths (for the redist_map and redist_plans objects, and summary statistics) have been edited

@CoryMcCartan

CoryMcCartan commented 1 year ago

Thanks for the PR. For the summary stats at the bottom something is going wrong. Make sure redist is loaded when you run them. You should see simulation-specific details rather than a bunch of quantile summaries by column.

Also, we're seeing a lot of county splits relative to the enacted. Can you try using the same approach we did for 2020 with combined county/cores, rather than just doing the cores?

https://github.com/alarm-redist/fifty-states/blob/main/analyses/NE_cd_2020/02_setup_NE_cd_2020.R#L14

CoryMcCartan commented 1 year ago

Also I don't see the tracker updated even though you checked that box. Can you update it?

mzhao80 commented 1 year ago

This tracker should be updated.

validation_20220926_1516

SMC: 5,000 sampled plans of 3 districts on 1,652 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.046 to 0.348
x WARNING: Low plan diversity

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black 
      1.002241       1.018813       1.002465       1.002012       1.003524       1.009250       1.008240 
      pop_hisp       pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white 
      1.001504       1.018715       1.011499       1.000690       1.013433       1.006200       1.011990 
     vap_black       vap_hisp       vap_aian      vap_asian       vap_nhpi      vap_other        vap_two 
      1.006314       1.001898       1.022521       1.010333       1.000712       1.006381       1.003616 
pre_16_rep_tru pre_16_dem_cli pre_20_rep_tru pre_20_dem_bid uss_18_rep_fis uss_18_dem_ray uss_20_rep_sas 
      1.004838       1.002289       1.005753       1.002669       1.003266       1.001871       1.003180 
uss_20_dem_jan gov_18_rep_ric gov_18_dem_kri atg_18_rep_pet sos_18_rep_evn sos_18_dem_dan         adv_16 
      1.002544       1.003182       1.002072       1.002858       1.003771       1.001827       1.002289 
        adv_18         adv_20         arv_16         arv_18         arv_20  county_splits    muni_splits 
      1.002103       1.002479       1.004838       1.002806       1.004281       1.004378       1.005061 
           ndv            nrv        ndshare          e_dvs          e_dem          pbias           egap 
      1.002243       1.003310       1.003992       1.003951       1.005790       1.000858       1.004215 

Sampling diagnostics for SMC run 1 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,074 (85.9%)      3.4%        0.22   779 ( 99%)     10 
Split 2       765 (61.2%)      2.3%        0.56   612 ( 77%)      6 
Resample      338 (27.1%)       NA%        1.63   595 ( 75%)     NA 

Sampling diagnostics for SMC run 2 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,066 (85.2%)      4.1%        0.21   789 (100%)      8 
Split 2       754 (60.3%)      3.0%        0.59   587 ( 74%)      5 
Resample      320 (25.6%)       NA%        1.67   594 ( 75%)     NA 

Sampling diagnostics for SMC run 3 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,067 (85.4%)      4.2%        0.21   793 (100%)      8 
Split 2       800 (64.0%)      2.8%        0.55   588 ( 74%)      5 
Resample      375 (30.0%)       NA%        1.59   624 ( 79%)     NA 

Sampling diagnostics for SMC run 4 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,081 (86.5%)      3.0%        0.21   798 (101%)     11 
Split 2       805 (64.4%)      2.3%        0.53   619 ( 78%)      6 
Resample      379 (30.3%)       NA%        1.60   649 ( 82%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log
weights (more than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be
between 1 and 1.05.
• Low diversity: Check for potential bottlenecks. Increase the number of samples. Examine the diversity plot
with `hist(plans_diversity(plans), breaks=24)`. Consider weakening or removing constraints, or increasing
the population tolerance. If the accpetance rate drops quickly in the final splits, try increasing
`pop_temper` by 0.01.

NOTE: Cores constraint limits max VI distance. Did 4 independent runs rather than 2 to maximize diversity. @CoryMcCartan

CoryMcCartan commented 1 year ago

Hm, diversity is still lower than we'd like (cf the first draft you had). What happens if you remove the additional custom constraint and just run with the built-in county constraint?

mzhao80 commented 1 year ago

Hm, diversity is still lower than we'd like (cf the first draft you had). What happens if you remove the additional custom constraint and just run with the built-in county constraint?

We see significant improvement with plan weights and diversity but more county splits.

validation_20220926_2038

SMC: 5,000 sampled plans of 3 districts on 1,652 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.079 to 0.416
x WARNING: Low plan diversity

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black 
     1.0012230      1.0008035      0.9998143      1.0008056      1.0006819      1.0000933      1.0016689 
      pop_hisp       pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white 
     1.0001511      1.0014075      1.0018163      1.0000299      1.0005896      1.0017560      1.0008699 
     vap_black       vap_hisp       vap_aian      vap_asian       vap_nhpi      vap_other        vap_two 
     1.0012753      1.0003513      1.0010595      1.0016024      1.0004983      1.0004933      1.0012645 
pre_16_rep_tru pre_16_dem_cli pre_20_rep_tru pre_20_dem_bid uss_18_rep_fis uss_18_dem_ray uss_20_rep_sas 
     0.9999929      1.0014798      0.9998098      1.0015018      0.9997766      1.0008385      0.9999236 
uss_20_dem_jan gov_18_rep_ric gov_18_dem_kri atg_18_rep_pet sos_18_rep_evn sos_18_dem_dan         adv_16 
     1.0016203      0.9997972      1.0012770      1.0000743      0.9997355      1.0011326      1.0014798 
        adv_18         adv_20         arv_16         arv_18         arv_20  county_splits    muni_splits 
     1.0011208      1.0015479      0.9999929      0.9998232      0.9998810      1.0014544      1.0015087 
           ndv            nrv        ndshare          e_dvs          e_dem          pbias           egap 
     1.0012823      0.9997661      1.0015310      1.0015011      1.0006474      0.9998004      1.0008763 

Sampling diagnostics for SMC run 1 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,237 (98.9%)      3.2%        0.21   801 (101%)     10 
Split 2     1,216 (97.3%)      2.4%        0.33   657 ( 83%)      6 
Resample    1,105 (88.4%)       NA%        0.32 1,091 (138%)     NA 

Sampling diagnostics for SMC run 2 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,236 (98.9%)      4.3%        0.21   784 ( 99%)      8 
Split 2     1,224 (97.9%)      2.9%        0.29   649 ( 82%)      5 
Resample    1,145 (91.6%)       NA%        0.29 1,111 (141%)     NA 

Sampling diagnostics for SMC run 3 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,236 (98.9%)      4.2%        0.21   794 (100%)      8 
Split 2     1,225 (98.0%)      2.8%        0.29   638 ( 81%)      5 
Resample    1,152 (92.1%)       NA%        0.28 1,095 (139%)     NA 

Sampling diagnostics for SMC run 4 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,236 (98.9%)      3.1%        0.21   803 (102%)     11 
Split 2     1,221 (97.7%)      2.1%        0.31   635 ( 80%)      7 
Resample    1,135 (90.8%)       NA%        0.30 1,098 (139%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log
weights (more than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be
between 1 and 1.05.
• Low diversity: Check for potential bottlenecks. Increase the number of samples. Examine the diversity plot
with `hist(plans_diversity(plans), breaks=24)`. Consider weakening or removing constraints, or increasing
the population tolerance. If the accpetance rate drops quickly in the final splits, try increasing
`pop_temper` by 0.01.

CoryMcCartan commented 1 year ago

Hm something is going wrong with how the counties/cores are set up. See the old PR & how we are able to limit county splits to 2 or less: https://github.com/alarm-redist/fifty-states/pull/96

Have you tried plotting the cores/counties? How does that look?

mzhao80 commented 1 year ago

Implement fixes. validation_20221002_2343

SMC: 5,000 sampled plans of 3 districts on 1,652 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.11 to 0.45
x WARNING: Low plan diversity

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black       pop_hisp 
     1.0003055      1.0004240      1.0000593      1.0007905      1.0000287      0.9998696      1.0001040      1.0014082 
      pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white      vap_black       vap_hisp 
     1.0001982      1.0004675      0.9997999      1.0001641      1.0002904      1.0002738      1.0001107      1.0011856 
      vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_rep_tru pre_16_dem_cli pre_20_rep_tru 
     0.9999343      1.0004032      0.9999556      0.9997093      1.0000821      1.0012069      1.0002471      1.0007071 
pre_20_dem_bid uss_18_rep_fis uss_18_dem_ray uss_20_rep_sas uss_20_dem_jan gov_18_rep_ric gov_18_dem_kri atg_18_rep_pet 
     1.0010140      1.0006886      1.0005293      1.0006475      1.0005141      1.0002943      1.0003369      0.9999296 
sos_18_rep_evn sos_18_dem_dan         adv_16         adv_18         adv_20         arv_16         arv_18         arv_20 
     1.0004149      1.0003144      1.0002471      1.0003520      1.0008157      1.0012069      1.0003822      1.0005244 
 county_splits    muni_splits            ndv            nrv        ndshare          e_dvs          e_dem          pbias 
     0.9997579      1.0000198      1.0004680      1.0006237      1.0000495      1.0000453      1.0003679      1.0001791 
          egap 
     1.0010406 

Sampling diagnostics for SMC run 1 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,236 (98.8%)      3.4%        0.22   795 (101%)     10 
Split 2     1,221 (97.7%)      2.2%        0.30   652 ( 83%)      6 
Resample    1,128 (90.3%)       NA%        0.29 1,101 (139%)     NA 

Sampling diagnostics for SMC run 2 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,235 (98.8%)      5.6%        0.22   786 ( 99%)      6 
Split 2     1,212 (97.0%)      3.3%        0.33   682 ( 86%)      4 
Resample    1,078 (86.3%)       NA%        0.32 1,096 (139%)     NA 

Sampling diagnostics for SMC run 3 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,236 (98.9%)      4.2%        0.21   810 (103%)      8 
Split 2     1,224 (97.9%)      2.7%        0.29   659 ( 83%)      5 
Resample    1,143 (91.4%)       NA%        0.29 1,097 (139%)     NA 

Sampling diagnostics for SMC run 4 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,235 (98.8%)      5.0%        0.22   772 ( 98%)      7 
Split 2     1,220 (97.6%)      3.5%        0.31   672 ( 85%)      4 
Resample    1,120 (89.6%)       NA%        0.30 1,100 (139%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more than 3 or
so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.
• Low diversity: Check for potential bottlenecks. Increase the number of samples. Examine the diversity plot with
`hist(plans_diversity(plans), breaks=24)`. Consider weakening or removing constraints, or increasing the population tolerance. If the acceptance rate drops quickly in the final splits, try increasing `pop_temper` by 0.01.

NOTE: Cores constraint limits max VI distance. Did 4 independent runs rather than 2 to maximize diversity. @CoryMcCartan

alarm-redist / fifty-states