alarm-redist / fifty-states

Redistricting analysis for all 50 U.S. states
https://alarm-redist.github.io/fifty-states/
Other
9 stars 7 forks source link

2010 New York Congressional Districts #143

Closed taransamarth closed 1 year ago

taransamarth commented 1 year ago

Redistricting requirements

In New York, districts must, per judicial order:

  1. be contiguous
  2. have equal populations
  3. be geographically compact
  4. preserve political subdivisions, communities of interest, and cores of existing districts
  5. protect incumbents where possible.

When developing the 2010 map, the courts decided to assign zero weight to incumbent protection and minimal weight to core preservation.

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%.

Data Sources

Data for New York comes from the ALARM Project's 2010 Redistricting Data Files.

Pre-processing Notes

We use a county constraint to preserve district cores, since districts are generally structured around counties.

Simulation Notes

We sample 60,000 districting plans for New York over two runs of the SMC algorithm and thin the sample down to 5,000 plans.

No special techniques were needed to produce the sample.

Validation (60,000 plans)

image
SMC: 60,000 sampled plans of 27 districts on 14,926 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.001

Plan diversity 80% range: 0.83 to 0.96

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black       pop_hisp 
      1.031845       1.003389       1.003010       1.083060       1.055510       1.004315       1.018299       1.006696 
      pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white      vap_black       vap_hisp 
      1.019953       1.025119       1.017345       1.006901       1.011516       1.001548       1.018805       1.007273 
      vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid 
      1.016376       1.019094       1.015643       1.049204       1.004235       1.002468       1.014020       1.001548 
pre_20_rep_tru uss_16_dem_sch uss_16_rep_lon uss_18_dem_gil uss_18_rep_far gov_18_dem_cuo gov_18_rep_mol atg_18_dem_jam 
      1.014776       1.001419       1.002254       1.001511       1.017552       1.001486       1.018721       1.001785 
atg_18_rep_wof         adv_16         adv_18         adv_20         arv_16         arv_18         arv_20  county_splits 
      1.017574       1.002069       1.001658       1.001548       1.018188       1.018137       1.014776       1.020984 
   muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem          e_dem          pbias 
      1.005079       1.001883       1.020245       1.009872       1.008945       1.029212       1.045219       1.009887 
          egap 
      1.043226 
✖ WARNING: SMC runs have not converged.

Sampling diagnostics for SMC run 1 of 2 (30,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    26,464 (88.2%)     19.4%        0.38 18,921 (100%)     15 
Split 2    24,460 (81.5%)     31.2%        0.45 18,227 ( 96%)      9 
Split 3    15,564 (51.9%)     38.3%        0.51 17,990 ( 95%)      7 
Split 4    14,989 (50.0%)     47.5%        0.52 17,468 ( 92%)      5 
Split 5    12,510 (41.7%)     32.3%        0.54 17,312 ( 91%)      8 
Split 6    12,192 (40.6%)     34.9%        0.55 17,144 ( 90%)      7 
Split 7     9,693 (32.3%)     24.5%        0.57 17,019 ( 90%)     10 
Split 8    13,180 (43.9%)     37.1%        0.57 16,872 ( 89%)      6 
Split 9     9,191 (30.6%)     46.7%        0.60 16,973 ( 90%)      4 
Split 10   14,754 (49.2%)     34.1%        0.58 16,598 ( 88%)      6 
Split 11   15,189 (50.6%)     43.1%        0.60 16,987 ( 90%)      4 
Split 12   13,864 (46.2%)     48.3%        0.59 16,851 ( 89%)      3 
Split 13   14,620 (48.7%)     38.6%        0.60 16,932 ( 89%)      4 
Split 14    8,690 (29.0%)     31.8%        0.61 16,883 ( 89%)      5 
Split 15   11,896 (39.7%)     35.6%        0.77 15,440 ( 81%)      4 
Split 16    9,144 (30.5%)     21.5%        0.82 15,587 ( 82%)      7 
Split 17   12,192 (40.6%)     27.3%        0.83 15,572 ( 82%)      5 
Split 18   10,853 (36.2%)     21.9%        0.85 15,699 ( 83%)      6 
Split 19    9,864 (32.9%)     28.5%        0.87 15,446 ( 81%)      4 
Split 20   12,241 (40.8%)     32.2%        0.87 15,376 ( 81%)      3 
Split 21    9,621 (32.1%)     20.5%        0.89 15,065 ( 79%)      5 
Split 22    9,502 (31.7%)     22.1%        0.89 14,938 ( 79%)      4 
Split 23   10,473 (34.9%)     23.5%        0.88 14,617 ( 77%)      3 
Split 24    9,093 (30.3%)     24.9%        0.86 14,202 ( 75%)      2 
Split 25    7,654 (25.5%)     20.1%        0.83 13,993 ( 74%)      2 
Split 26   13,059 (43.5%)      7.3%        0.74 12,492 ( 66%)      2 
Resample   11,828 (39.4%)       NA%        0.84 15,356 ( 81%)     NA 

Sampling diagnostics for SMC run 2 of 2 (30,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    26,542 (88.5%)     22.4%        0.37 18,965 (100%)     13 
Split 2    16,959 (56.5%)     34.9%        0.45 18,255 ( 96%)      8 
Split 3    17,207 (57.4%)     38.1%        0.51 17,876 ( 94%)      7 
Split 4    12,635 (42.1%)     48.0%        0.53 17,524 ( 92%)      5 
Split 5    10,757 (35.9%)     36.1%        0.54 17,196 ( 91%)      7 
Split 6     4,184 (13.9%)     45.2%        0.57 17,123 ( 90%)      5 
Split 7     5,398 (18.0%)     57.2%        0.57 16,498 ( 87%)      3 
Split 8    11,601 (38.7%)     63.4%        0.56 16,562 ( 87%)      2 
Split 9     9,403 (31.3%)     53.4%        0.56 17,038 ( 90%)      3 
Split 10   13,840 (46.1%)     59.6%        0.56 16,970 ( 89%)      2 
Split 11   11,393 (38.0%)     49.9%        0.57 17,123 ( 90%)      3 
Split 12   13,559 (45.2%)     41.0%        0.57 16,947 ( 89%)      4 
Split 13   16,283 (54.3%)     38.8%        0.58 17,060 ( 90%)      4 
Split 14   11,106 (37.0%)     36.8%        0.59 16,899 ( 89%)      4 
Split 15   14,115 (47.0%)     42.1%        0.77 15,724 ( 83%)      3 
Split 16   12,501 (41.7%)     47.3%        0.81 16,012 ( 84%)      2 
Split 17   14,339 (47.8%)     45.0%        0.82 15,729 ( 83%)      2 
Split 18   13,509 (45.0%)     35.7%        0.85 15,807 ( 83%)      3 
Split 19   12,325 (41.1%)     28.0%        0.87 15,640 ( 82%)      4 
Split 20   12,099 (40.3%)     31.6%        0.89 15,554 ( 82%)      3 
Split 21   11,343 (37.8%)     29.7%        0.88 15,283 ( 81%)      3 
Split 22   11,050 (36.8%)     33.2%        0.88 15,280 ( 81%)      2 
Split 23    9,266 (30.9%)     19.8%        0.86 15,011 ( 79%)      4 
Split 24   10,942 (36.5%)     20.2%        0.83 14,788 ( 78%)      3 
Split 25   13,412 (44.7%)     13.0%        0.79 14,300 ( 75%)      4 
Split 26    9,038 (30.1%)      6.3%        0.78 12,939 ( 68%)      3 
Resample    7,294 (24.3%)       NA%        0.86 15,058 ( 79%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more
than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.
• SMC convergence: Increase the number of samples. If you are experiencing low plan diversity or bottlenecks as well, address
those issues first.

Validation (5,000 thinned plans)

image
SMC: 5,000 sampled plans of 27 districts on 14,926 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.001

Plan diversity 80% range: 0.82 to 0.97

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby 
      1.031210       1.006190       1.002851       1.092325       1.052427 
     pop_white      pop_black       pop_hisp       pop_aian      pop_asian 
      1.000840       1.005858       1.005205       1.023448       1.025423 
      pop_nhpi      pop_other        pop_two      vap_white      vap_black 
      1.019783       1.005370       1.011846       1.000946       1.002704 
      vap_hisp       vap_aian      vap_asian       vap_nhpi      vap_other 
      1.005737       1.019584       1.017655       1.016283       1.047539 
       vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid pre_20_rep_tru 
      1.006845       1.001240       1.012895       1.002193       1.013492 
uss_16_dem_sch uss_16_rep_lon uss_18_dem_gil uss_18_rep_far gov_18_dem_cuo 
      1.002365       1.000916       1.002166       1.014744       1.002030 
gov_18_rep_mol atg_18_dem_jam atg_18_rep_wof         adv_16         adv_18 
      1.015341       1.002329       1.014458       1.003117       1.002374 
        adv_20         arv_16         arv_18         arv_20  county_splits 
      1.002193       1.016314       1.014964       1.013492       1.024048 
   muni_splits            ndv            nrv        ndshare          e_dvs 
      1.005240       1.002738       1.017900       1.008176       1.007321 
        pr_dem          e_dem          pbias           egap 
      1.029387       1.043929       1.006710       1.042371 
✖ WARNING: SMC runs have not converged.

Sampling diagnostics for SMC run 1 of 2 (30,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    26,464 (88.2%)     19.4%        0.38 18,921 (100%)     15 
Split 2    24,460 (81.5%)     31.2%        0.45 18,227 ( 96%)      9 
Split 3    15,564 (51.9%)     38.3%        0.51 17,990 ( 95%)      7 
Split 4    14,989 (50.0%)     47.5%        0.52 17,468 ( 92%)      5 
Split 5    12,510 (41.7%)     32.3%        0.54 17,312 ( 91%)      8 
Split 6    12,192 (40.6%)     34.9%        0.55 17,144 ( 90%)      7 
Split 7     9,693 (32.3%)     24.5%        0.57 17,019 ( 90%)     10 
Split 8    13,180 (43.9%)     37.1%        0.57 16,872 ( 89%)      6 
Split 9     9,191 (30.6%)     46.7%        0.60 16,973 ( 90%)      4 
Split 10   14,754 (49.2%)     34.1%        0.58 16,598 ( 88%)      6 
Split 11   15,189 (50.6%)     43.1%        0.60 16,987 ( 90%)      4 
Split 12   13,864 (46.2%)     48.3%        0.59 16,851 ( 89%)      3 
Split 13   14,620 (48.7%)     38.6%        0.60 16,932 ( 89%)      4 
Split 14    8,690 (29.0%)     31.8%        0.61 16,883 ( 89%)      5 
Split 15   11,896 (39.7%)     35.6%        0.77 15,440 ( 81%)      4 
Split 16    9,144 (30.5%)     21.5%        0.82 15,587 ( 82%)      7 
Split 17   12,192 (40.6%)     27.3%        0.83 15,572 ( 82%)      5 
Split 18   10,853 (36.2%)     21.9%        0.85 15,699 ( 83%)      6 
Split 19    9,864 (32.9%)     28.5%        0.87 15,446 ( 81%)      4 
Split 20   12,241 (40.8%)     32.2%        0.87 15,376 ( 81%)      3 
Split 21    9,621 (32.1%)     20.5%        0.89 15,065 ( 79%)      5 
Split 22    9,502 (31.7%)     22.1%        0.89 14,938 ( 79%)      4 
Split 23   10,473 (34.9%)     23.5%        0.88 14,617 ( 77%)      3 
Split 24    9,093 (30.3%)     24.9%        0.86 14,202 ( 75%)      2 
Split 25    7,654 (25.5%)     20.1%        0.83 13,993 ( 74%)      2 
Split 26   13,059 (43.5%)      7.3%        0.74 12,492 ( 66%)      2 
Resample   11,828 (39.4%)       NA%        0.84 15,356 ( 81%)     NA 

Sampling diagnostics for SMC run 2 of 2 (30,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    26,542 (88.5%)     22.4%        0.37 18,965 (100%)     13 
Split 2    16,959 (56.5%)     34.9%        0.45 18,255 ( 96%)      8 
Split 3    17,207 (57.4%)     38.1%        0.51 17,876 ( 94%)      7 
Split 4    12,635 (42.1%)     48.0%        0.53 17,524 ( 92%)      5 
Split 5    10,757 (35.9%)     36.1%        0.54 17,196 ( 91%)      7 
Split 6     4,184 (13.9%)     45.2%        0.57 17,123 ( 90%)      5 
Split 7     5,398 (18.0%)     57.2%        0.57 16,498 ( 87%)      3 
Split 8    11,601 (38.7%)     63.4%        0.56 16,562 ( 87%)      2 
Split 9     9,403 (31.3%)     53.4%        0.56 17,038 ( 90%)      3 
Split 10   13,840 (46.1%)     59.6%        0.56 16,970 ( 89%)      2 
Split 11   11,393 (38.0%)     49.9%        0.57 17,123 ( 90%)      3 
Split 12   13,559 (45.2%)     41.0%        0.57 16,947 ( 89%)      4 
Split 13   16,283 (54.3%)     38.8%        0.58 17,060 ( 90%)      4 
Split 14   11,106 (37.0%)     36.8%        0.59 16,899 ( 89%)      4 
Split 15   14,115 (47.0%)     42.1%        0.77 15,724 ( 83%)      3 
Split 16   12,501 (41.7%)     47.3%        0.81 16,012 ( 84%)      2 
Split 17   14,339 (47.8%)     45.0%        0.82 15,729 ( 83%)      2 
Split 18   13,509 (45.0%)     35.7%        0.85 15,807 ( 83%)      3 
Split 19   12,325 (41.1%)     28.0%        0.87 15,640 ( 82%)      4 
Split 20   12,099 (40.3%)     31.6%        0.89 15,554 ( 82%)      3 
Split 21   11,343 (37.8%)     29.7%        0.88 15,283 ( 81%)      3 
Split 22   11,050 (36.8%)     33.2%        0.88 15,280 ( 81%)      2 
Split 23    9,266 (30.9%)     19.8%        0.86 15,011 ( 79%)      4 
Split 24   10,942 (36.5%)     20.2%        0.83 14,788 ( 78%)      3 
Split 25   13,412 (44.7%)     13.0%        0.79 14,300 ( 75%)      4 
Split 26    9,038 (30.1%)      6.3%        0.78 12,939 ( 68%)      3 
Resample    7,294 (24.3%)       NA%        0.86 15,058 ( 79%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than
1%), large std. devs. of the log weights (more than 3 or so), and low
numbers of unique plans. R-hat values for summary statistics should be
between 1 and 1.05.
• SMC convergence: Increase the number of samples. If you are experiencing
low plan diversity or bottlenecks as well, address those issues first.

Checklist

@CoryMcCartan (note that the SMC does not converge for comp_polsby and comp_edge, but I'm at the limit of my computing power to keep pushing the number of sims up)