2010 Texas Congressional Districts

mzwu commented 1 year ago

Redistricting requirements

In Texas, districts must meet US constitutional requirements, but there are no state-specific statutes.

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%.

Data Sources

Data for Texas comes from the ALARM Project's 2020 Redistricting Data Files.

Pre-processing Notes

We estimate CVAP populations with the cvap R package. We also pre-process the map to split it into clusters for simulation, which has a slight effect on the types of district plans that will be sampled.

Simulation Notes

We sample 50,000 districting plans for Texas across two independent runs of the SMC algorithm, and then thin the sample to down to 5,000 plans. We use a pseudo-county constraint to limit the county and municipality splits. Due to the size and complexity of Texas, we split the simulations into multiple steps.

1. Clustering procedure

First, we run simulations in three major metropolitan areas: Greater Houston, a combination of Greater San Antonio and Austin, and Dallas-Fort Worth. We use collections of counties that define the Metropolitan Statistical Areas. The counties in each cluster are those in each Census MSA:

Houston–The Woodlands–Sugar Land: Austin, Brazoria, Chambers, Fort Bend, Galveston, Harris, Liberty, Montgomery, Waller.
Austin–Round Rock-Georgetown: Bastrop, Caldwell, Hays, Travis, Williamson.
San Antonio–New Braunfels: Atascosa, Bandera, Bexar, Comal, Guadalupe, Kendall, Medina, Wilson.
Dallas–Fort Worth–Arlington: Collin, Dallas, Denton, Ellis, Hunt, Kaufman, Rockwall, Johnson, Parker, Tarrant, Wise.

These simulations run the SMC algorithm within each cluster with a 0.25% population tolerance. Because each cluster will have leftover population, we apply an additional constraint that incentivizes leaving any unassigned areas on the edge of these clusters to avoid discontiguities.

In each cluster, we apply hinge Gibbs constraints of strength 3 to encourage the formation of Hispanic CVAP opportunity districts. In Houston and Dallas, we also apply a hinge Gibbs constraint of strength 3 to encourage the formation of Black CVAP opportunity districts. These districts nudge the formation of opportunity districts are above 35%, and penalize districts with minority populations above 70%.

2. Combination procedure

Then, these partial map simulations are combined to run statewide simulations. We again apply Gibbs hinge constraints to encourage the formation of minority opportunity districts, with strength 3 to further encourage Hispanic CVAP opportunity districts.

Validation

SMC: 50,000 sampled plans of 36 districts on 8,324 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.015

Plan diversity 80% range: 0.76 to 0.89

R-hat values for summary statistics:
   pop_overlap      total_vap     total_cvap       plan_dev      comp_edge    comp_polsby 
      1.019713       1.000765       1.001368       1.063390       1.021500       1.025287 
     pop_white      pop_black       pop_hisp       pop_aian      pop_asian       pop_nhpi 
      1.034317       1.076107       1.030654       1.006023       1.009740       1.008619 
     pop_other        pop_two      vap_white      vap_black       vap_hisp       vap_aian 
      1.033250       1.026214       1.041864       1.074831       1.056603       1.001449 
     vap_asian       vap_nhpi      vap_other        vap_two pre_16_rep_tru pre_16_dem_cli 
      1.021050       1.009738       1.024545       1.003553       1.006802       1.057141 
pre_20_rep_tru pre_20_dem_bid uss_18_rep_cru uss_18_dem_oro uss_20_rep_cor uss_20_dem_heg 
      1.020598       1.014786       1.023798       1.007303       1.020234       1.020215 
gov_18_rep_abb gov_18_dem_val atg_18_rep_pax atg_18_dem_nel         adv_16         adv_18 
      1.023213       1.014576       1.019327       1.015313       1.057141       1.013074 
        adv_20         arv_16         arv_18         arv_20  county_splits    muni_splits 
      1.018537       1.006802       1.022502       1.020548       1.052084       1.010568 
           ndv            nrv        ndshare          e_dvs          e_dem          pbias 
      1.020112       1.018530       1.050662       1.050483       1.039622       1.038730 
          egap 
      1.108590 
✖ WARNING: SMC runs have not converged.

Sampling diagnostics for SMC run 1 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    18,016 (72.1%)     20.3%        0.63 15,737 (100%)      7 
Split 2    16,643 (66.6%)     25.3%        0.73 14,481 ( 92%)      5 
Split 3    14,456 (57.8%)     27.7%        0.73 14,152 ( 90%)      4 
Split 4    12,521 (50.1%)     15.3%        0.73 13,752 ( 87%)      7 
Split 5    10,636 (42.5%)     23.0%        0.75 13,305 ( 84%)      4 
Split 6    10,459 (41.8%)     20.8%        0.76 12,830 ( 81%)      4 
Split 7    10,614 (42.5%)     18.7%        0.75 12,586 ( 80%)      4 
Split 8     9,657 (38.6%)     20.4%        0.76 12,346 ( 78%)      3 
Split 9     8,687 (34.7%)     22.0%        0.76 12,070 ( 76%)      2 
Split 10    8,578 (34.3%)     17.9%        0.74 11,263 ( 71%)      2 
Split 11    8,295 (33.2%)     13.9%        0.79 10,628 ( 67%)      3 
Split 12    7,703 (30.8%)      5.2%        0.73 10,793 ( 68%)      3 
Split 13    4,932 (19.7%)      1.4%        0.80  8,379 ( 53%)      4 
Resample    4,441 (17.8%)       NA%        1.40  9,729 ( 62%)     NA 

Sampling diagnostics for SMC run 2 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    18,009 (72.0%)      8.9%        0.63 15,693 ( 99%)     16 
Split 2    16,198 (64.8%)     14.3%        0.72 14,524 ( 92%)      9 
Split 3    14,872 (59.5%)     22.9%        0.73 14,118 ( 89%)      5 
Split 4    13,042 (52.2%)     17.7%        0.74 13,861 ( 88%)      6 
Split 5    11,721 (46.9%)     19.1%        0.73 13,404 ( 85%)      5 
Split 6    10,776 (43.1%)     20.8%        0.75 12,988 ( 82%)      4 
Split 7     9,521 (38.1%)     18.9%        0.76 12,602 ( 80%)      4 
Split 8     9,907 (39.6%)     20.0%        0.75 12,351 ( 78%)      3 
Split 9     8,963 (35.9%)     16.8%        0.74 12,078 ( 76%)      3 
Split 10    7,927 (31.7%)     18.9%        0.77 11,332 ( 72%)      2 
Split 11   10,217 (40.9%)     11.4%        0.74 10,329 ( 65%)      4 
Split 12    8,081 (32.3%)      5.2%        0.72 10,647 ( 67%)      3 
Split 13    3,756 (15.0%)      2.5%        0.75  8,201 ( 52%)      2 
Resample    3,274 (13.1%)       NA%        1.45  8,960 ( 57%)     NA 

• Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs.
of the log weights (more than 3 or so), and low numbers of unique plans. R-hat values for summary
statistics should be between 1 and 1.05.
• SMC convergence: Increase the number of samples. If you are experiencing low plan diversity or
bottlenecks as well, address those issues first.

Checklist

[x] I have followed the instructions
[x] I have updated the tracker
[x] All TODO lines from the template code have been removed
[x] I have merged in the master branch and then recalculated summary statistics
[x] I have run enforce_style() to format my code
[x] The documentation copied above is up-to-date
[x] There are no data files in this pull request
[x] None of the file output paths (for the redist_map and redist_plans objects, and summary statistics) have been edited

@CoryMcCartan @christopherkenny @tylersimko

mzwu commented 1 year ago

@christopherkenny @tylersimko Just pushed the edits! Let me know if there's anything else I need to change.

christopherkenny commented 1 year ago

I think I'm good to go on this. The simulations seem to meet VRA compliance, without introducing issues of racial gerrymandering associated with the few districts that are not replicated. This requires a bit more of a judgement call than in many other states, but I think it threads the line quite nicely between insufficient attention to race and too much attention to race in replicating the state's decisions in 2010.

alarm-redist / fifty-states