alarm-redist / fifty-states

Redistricting analysis for all 50 U.S. states
https://alarm-redist.github.io/fifty-states/
Other
9 stars 7 forks source link

Re-run 2020 Texas Congressional Districts #124

Closed tylersimko closed 2 years ago

tylersimko commented 2 years ago

Redistricting requirements

In Texas, districts must meet US constitutional requirements, but there are no state-specific statutes.

Interpretation of requirements

We enforce a maximum population deviation of 0.5%.

Data Sources

Data for Texas comes from the ALARM Project's 2020 Redistricting Data Files.

Pre-processing Notes

We estimate CVAP populations with the cvap R package. We also pre-process the map to split it into clusters for simulation, which has a slight effect on the types of district plans that will be sampled.

Simulation Notes

We sample 50,000 districting plans for Texas across two independent runs of the SMC algorithm. Due to the size and complexity of Texas, we split the simulations into multiple steps.

1. Clustering procedure

First, we run simulations in three major metropolitan areas: Greater Houston, a combination of Greater San Antonio and Austin, and Dallas-Fort Worth. We use collections of counties that define the Metropolitan Statistical Areas. The counties in each cluster are those in each Census MSA:

These simulations run the SMC algorithm within each cluster with a 0.25% population tolerance. Because each cluster will have leftover population, we apply an additional constraint that incentivizes leaving any unassigned areas on the edge of these clusters to avoid discontiguities.

In each cluster, we apply hinge Gibbs constraints of strength 3 to encourage the formation of Hispanic CVAP opportunity districts. In Houston, we also apply a hinge Gibbs constraint of strength 3 to encourage the formation of Black CVAP opportunity districts. These districts nudge the formation of opportunity districts are above 35%, and penalize districts with minority populations above 70%.

2. Combination procedure

Then, these partial map simulations are combined to run statewide simulations. We again apply Gibbs hing constraints to encourage the formation of minority opportunity districts.

Validation

validation_20220623_1107

image

SMC: 50,000 sampled plans of 38 districts on 9,007 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.03

Plan diversity 80% range: 0.79 to 0.89

R-hat values for summary statistics:
   pop_overlap      total_vap     total_cvap       plan_dev 
      1.009922       1.025592       1.018034       1.007017 
     comp_edge    comp_polsby       pop_hisp      pop_white 
      1.003834       1.005717       1.004403       1.012172 
     pop_black       pop_aian      pop_asian       pop_nhpi 
      1.019157       1.001467       1.005970       1.002071 
     pop_other        pop_two       vap_hisp      vap_white 
      1.002121       1.000102       1.006503       1.012468 
     vap_black       vap_aian      vap_asian       vap_nhpi 
      1.016500       1.000702       1.009907       1.020889 
     vap_other        vap_two     cvap_white     cvap_black 
      1.005935       1.002453       1.002304       1.021917 
     cvap_hisp     cvap_asian      cvap_aian      cvap_nhpi 
      1.008579       1.028057       1.000806       1.004785 
      cvap_two     cvap_other pre_16_rep_tru pre_16_dem_cli 
      1.000146       1.016484       1.008993       1.010982 
uss_18_rep_cru uss_18_dem_oro gov_18_rep_abb gov_18_dem_val 
      1.012814       1.061978       1.015099       1.046836 
atg_18_rep_pax atg_18_dem_nel pre_20_rep_tru pre_20_dem_bid 
      1.013984       1.054138       1.016592       1.040385 
uss_20_rep_cor uss_20_dem_heg         arv_16         adv_16 
      1.019423       1.039547       1.008993       1.010982 
        arv_18         adv_18         arv_20         adv_20 
      1.013342       1.055268       1.016730       1.041282 
 county_splits    muni_splits            ndv            nrv 
      1.025081       1.014260       1.043716       1.015216 
       ndshare          e_dvs          e_dem          pbias 
      1.007986       1.008231       1.019652       1.023839 
          egap 
      1.029078 
✖ WARNING: SMC runs have not converged.

Sampling diagnostics for SMC run 1 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    13,837 (55.3%)     23.3%        0.55 15,809 (100%)      7 
Split 2    12,745 (51.0%)     28.2%        0.64 13,598 ( 86%)      5 
Split 3    11,201 (44.8%)     18.9%        0.65 13,138 ( 83%)      7 
Split 4    10,533 (42.1%)     11.1%        0.68 13,032 ( 82%)     11 
Split 5     9,340 (37.4%)     15.7%        0.69 12,722 ( 81%)      7 
Split 6     7,665 (30.7%)     16.7%        0.71 12,296 ( 78%)      6 
Split 7     7,968 (31.9%)     21.5%        0.71 11,792 ( 75%)      4 
Split 8     8,271 (33.1%)     23.7%        0.72 11,767 ( 74%)      3 
Split 9     7,850 (31.4%)     26.3%        0.72 11,580 ( 73%)      2 
Split 10    7,846 (31.4%)     17.0%        0.73 11,149 ( 71%)      3 
Split 11    7,281 (29.1%)     11.5%        0.78 10,520 ( 67%)      4 
Split 12    7,058 (28.2%)     10.1%        0.80 10,197 ( 65%)      3 
Split 13    4,406 (17.6%)      3.2%        0.73  9,618 ( 61%)      2 
Resample    2,961 (11.8%)       NA%        1.44  8,915 ( 56%)     NA 

Sampling diagnostics for SMC run 2 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    14,235 (56.9%)     16.6%        0.55 15,810 (100%)     10 
Split 2    12,875 (51.5%)     23.9%        0.64 13,538 ( 86%)      6 
Split 3    11,304 (45.2%)     30.2%        0.65 13,227 ( 84%)      4 
Split 4    11,586 (46.3%)     34.0%        0.67 12,938 ( 82%)      3 
Split 5    10,172 (40.7%)     15.7%        0.67 12,753 ( 81%)      7 
Split 6     8,921 (35.7%)     23.6%        0.70 12,437 ( 79%)      4 
Split 7     8,191 (32.8%)     21.5%        0.71 11,894 ( 75%)      4 
Split 8     7,489 (30.0%)     24.2%        0.73 11,616 ( 74%)      3 
Split 9     7,201 (28.8%)     26.7%        0.72 11,422 ( 72%)      2 
Split 10    7,459 (29.8%)      8.2%        0.73 11,060 ( 70%)      7 
Split 11    7,404 (29.6%)     12.6%        0.78 10,219 ( 65%)      4 
Split 12    7,128 (28.5%)      9.1%        0.74 10,889 ( 69%)      3 
Split 13    4,214 (16.9%)      2.0%        0.73  8,795 ( 56%)      4 
Resample    3,413 (13.7%)       NA%        1.50  9,242 ( 58%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less
than 1%), large std. devs. of the log weights (more than 3 or so), and
low numbers of unique plans. R-hat values for summary statistics should
be between 1 and 1.05.
• SMC convergence: Increase the number of samples. If you are
experiencing low plan diversity or bottlenecks as well, address those
issues first.

Checklist

delete this line and all the tags except the reviewers you need @CoryMcCartan @christopherkenny

tylersimko commented 2 years ago

Summary for 5,000 final sampled:

SMC: 5,000 sampled plans of 38 districts on 9,007 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.03

Plan diversity 80% range: 0.79 to 0.90

R-hat values for summary statistics:
   pop_overlap      total_vap     total_cvap       plan_dev 
      1.013201       1.028427       1.018327       1.007102 
     comp_edge    comp_polsby       pop_hisp      pop_white 
      1.001882       1.009034       1.005172       1.016104 
     pop_black       pop_aian      pop_asian       pop_nhpi 
      1.025503       1.002803       1.004963       1.001971 
     pop_other        pop_two       vap_hisp      vap_white 
      1.001015       1.004194       1.007224       1.016347 
     vap_black       vap_aian      vap_asian       vap_nhpi 
      1.021574       1.001649       1.006905       1.018076 
     vap_other        vap_two     cvap_white     cvap_black 
      1.003823       1.003785       1.001303       1.021426 
     cvap_hisp     cvap_asian      cvap_aian      cvap_nhpi 
      1.009156       1.023701       1.001795       1.005509 
      cvap_two     cvap_other pre_16_rep_tru pre_16_dem_cli 
      1.000110       1.014984       1.009103       1.010923 
uss_18_rep_cru uss_18_dem_oro gov_18_rep_abb gov_18_dem_val 
      1.014252       1.051739       1.017069       1.037330 
atg_18_rep_pax atg_18_dem_nel pre_20_rep_tru pre_20_dem_bid 
      1.015387       1.045510       1.018862       1.028955 
uss_20_rep_cor uss_20_dem_heg         arv_16         adv_16 
      1.022019       1.028307       1.009103       1.010923 
        arv_18         adv_18         arv_20         adv_20 
      1.015017       1.053836       1.019445       1.029503 
 county_splits    muni_splits            ndv            nrv 
      1.024588       1.013475       1.039720       1.016543 
       ndshare          e_dvs          e_dem          pbias 
      1.009798       1.010090       1.021942       1.026753 
          egap 
      1.024320 
✖ WARNING: SMC runs have not converged.

Sampling diagnostics for SMC run 1 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    13,837 (55.3%)     23.3%        0.55 15,809 (100%)      7 
Split 2    12,745 (51.0%)     28.2%        0.64 13,598 ( 86%)      5 
Split 3    11,201 (44.8%)     18.9%        0.65 13,138 ( 83%)      7 
Split 4    10,533 (42.1%)     11.1%        0.68 13,032 ( 82%)     11 
Split 5     9,340 (37.4%)     15.7%        0.69 12,722 ( 81%)      7 
Split 6     7,665 (30.7%)     16.7%        0.71 12,296 ( 78%)      6 
Split 7     7,968 (31.9%)     21.5%        0.71 11,792 ( 75%)      4 
Split 8     8,271 (33.1%)     23.7%        0.72 11,767 ( 74%)      3 
Split 9     7,850 (31.4%)     26.3%        0.72 11,580 ( 73%)      2 
Split 10    7,846 (31.4%)     17.0%        0.73 11,149 ( 71%)      3 
Split 11    7,281 (29.1%)     11.5%        0.78 10,520 ( 67%)      4 
Split 12    7,058 (28.2%)     10.1%        0.80 10,197 ( 65%)      3 
Split 13    4,406 (17.6%)      3.2%        0.73  9,618 ( 61%)      2 
Resample    2,961 (11.8%)       NA%        1.44  8,915 ( 56%)     NA 

Sampling diagnostics for SMC run 2 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    14,235 (56.9%)     16.6%        0.55 15,810 (100%)     10 
Split 2    12,875 (51.5%)     23.9%        0.64 13,538 ( 86%)      6 
Split 3    11,304 (45.2%)     30.2%        0.65 13,227 ( 84%)      4 
Split 4    11,586 (46.3%)     34.0%        0.67 12,938 ( 82%)      3 
Split 5    10,172 (40.7%)     15.7%        0.67 12,753 ( 81%)      7 
Split 6     8,921 (35.7%)     23.6%        0.70 12,437 ( 79%)      4 
Split 7     8,191 (32.8%)     21.5%        0.71 11,894 ( 75%)      4 
Split 8     7,489 (30.0%)     24.2%        0.73 11,616 ( 74%)      3 
Split 9     7,201 (28.8%)     26.7%        0.72 11,422 ( 72%)      2 
Split 10    7,459 (29.8%)      8.2%        0.73 11,060 ( 70%)      7 
Split 11    7,404 (29.6%)     12.6%        0.78 10,219 ( 65%)      4 
Split 12    7,128 (28.5%)      9.1%        0.74 10,889 ( 69%)      3 
Split 13    4,214 (16.9%)      2.0%        0.73  8,795 ( 56%)      4 
Resample    3,413 (13.7%)       NA%        1.50  9,242 ( 58%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less
than 1%), large std. devs. of the log weights (more than 3 or so), and
low numbers of unique plans. R-hat values for summary statistics should
be between 1 and 1.05.
• SMC convergence: Increase the number of samples. If you are
experiencing low plan diversity or bottlenecks as well, address those
issues first.
CoryMcCartan commented 2 years ago

Other than this one thing, this looks great to me!

tylersimko commented 2 years ago

Added @CoryMcCartan thanks!

CoryMcCartan commented 2 years ago

Thanks. @christopherkenny if this is good to you we can merge!

tylersimko commented 2 years ago

Amazing -- I just double-checked everything is 5k, so @christopherkenny just let me know whenever and I'll run to finalize.

christopherkenny commented 2 years ago

Go for it! Great work @tylersimko!