2010 California Congressional Districts

mzwu commented 1 year ago

Redistricting requirements

In California, according to the California Constitution Article XXI, districts must:

be contiguous
have equal populations
be geographically compact
preserve city, county, neighborhood, and community of interest boundaries as much as possible
not favor or discriminate against incumbents, candidates, or parties
comply with the federal Voting Rights Act

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%. We use a pseudo-county constraint to limit the county and municipality splits. We add VRA constraints encouraging Hispanic VAP and Asian VAP majorities in districts.

Data Sources

Data for California comes from the ALARM Project's 2020 Redistricting Data Files. Data for the 2010 California enacted congressional map comes from All About Redistricting.

Pre-processing Notes

Islands were connected to their nearest point within county on the mainland.

Simulation Notes

We sample 25,000 districting plans in each cluster across 2 independent runs of the SMC algorithm. We next sample 50,000 districting plans for California across 2 independent runs of the SMC algorithm for the remainder. We then thin the sample to down to 5,000 plans. To balance county and municipality splits, we create pseudocounties for use in the county constraint. These are counties are Alameda County, Contra Costa County, Fresno County, Kern County, Los Angeles County, Orange County, Riverside County, Sacramento County, San Bernardino County, San Diego County, San Francisco County, San Joaquin County, San Mateo County, Santa Clara County, and Ventura County, which are larger than a congressional district in population. A small population tempering value was used for each cluster to avoid losing diversity at the final step based on initial runs.

1. Clustering Procedure

First, we run partial SMC in two pieces: the south and the Bay Area. The counties in each cluster are:

South: Los Angeles, San Bernardino, Orange, Riverside, San Diego, and Imperial
Bay: Alameda, Contra Costa, Fresno, Kings, Madera, Madera, Merced, Monterey, Sacramento, San Benito, San Francisco, San Joaquin, San Mateo, Santa Clara, Santa Cruz, Solano, Stanislaus, Tulare, and Yolo

We sample in each of these regions with a population deviation of 0.5%. We sample 28 districts in the southern region and 14 districts in the Bay Area. Because each cluster will have leftover population, we apply an additional constraint that incentivizes leaving any unassigned areas on the edge of these clusters to avoid discontiguities. For each cluster, we add VRA constraints encouraging Hispanic VAP and Asian VAP concentrations in districts, in line with the enacted plan.

2. Combination Procedure

Then, these partial map simulations are combined to run statewide simulations. We sample 11 districts in the remainder.

Validation

SMC: 5,000 sampled plans of 53 districts on 8,057 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.37 to 0.87

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black       pop_hisp 
     1.0112949      1.0017279      1.0105925      1.0094340      1.0061159      1.0040006      1.0077048      1.0029096 
      pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white      vap_black       vap_hisp 
     1.0078262      1.0033341      1.0021208      1.0062393      1.0028283      1.0027007      1.0065199      1.0032184 
      vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid 
     1.0078660      1.0038106      1.0042237      1.0063968      1.0064388      1.0086911      1.0019121      1.0169725 
pre_20_rep_tru         adv_16         adv_20         arv_16         arv_20  county_splits    muni_splits            ndv 
     1.0023822      1.0086911      1.0169725      1.0019121      1.0023822      1.0048133      1.0015119      1.0083052 
           nrv        ndshare          e_dvs         pr_dem          e_dem          pbias           egap 
     1.0023354      1.0037358      1.0037829      1.0001130      1.0000814      1.0220220      0.9998909 

Sampling diagnostics for SMC run 1 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    22,185 (88.7%)     11.0%        0.43 15,806 (100%)      5 
Split 2    21,778 (87.1%)     12.8%        0.55 15,222 ( 96%)      4 
Split 3    21,593 (86.4%)     11.7%        0.58 15,046 ( 95%)      4 
Split 4    21,262 (85.0%)     14.1%        0.62 14,759 ( 93%)      3 
Split 5    21,090 (84.4%)     13.0%        0.64 14,572 ( 92%)      3 
Split 6    20,654 (82.6%)      7.1%        0.67 14,219 ( 90%)      5 
Split 7    19,667 (78.7%)      7.5%        0.71 13,899 ( 88%)      4 
Split 8    19,095 (76.4%)      8.3%        0.79 13,349 ( 84%)      3 
Split 9    19,656 (78.6%)      4.8%        0.78 12,905 ( 82%)      4 
Split 10   19,173 (76.7%)      2.4%        0.78 11,913 ( 75%)      3 
Resample    9,253 (37.0%)       NA%        0.78 13,005 ( 82%)     NA 

Sampling diagnostics for SMC run 2 of 2 (25,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    22,155 (88.6%)     11.2%        0.43 15,788 (100%)      5 
Split 2    21,676 (86.7%)     10.2%        0.55 15,141 ( 96%)      5 
Split 3    21,496 (86.0%)     11.6%        0.58 15,041 ( 95%)      4 
Split 4    21,260 (85.0%)      8.6%        0.62 14,837 ( 94%)      5 
Split 5    21,138 (84.6%)      9.8%        0.63 14,492 ( 92%)      4 
Split 6    20,543 (82.2%)     11.6%        0.68 14,354 ( 91%)      3 
Split 7    19,969 (79.9%)      7.5%        0.73 13,907 ( 88%)      4 
Split 8    19,754 (79.0%)      8.0%        0.75 13,533 ( 86%)      3 
Split 9    19,649 (78.6%)      8.9%        0.76 13,046 ( 83%)      2 
Split 10   19,649 (78.6%)      3.3%        0.76 12,041 ( 76%)      2 
Resample   10,549 (42.2%)       NA%        0.76 13,074 ( 83%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more than 3
or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.

Checklist

[x] I have followed the instructions
[x] I have updated the tracker
[x] All TODO lines from the template code have been removed
[x] I have merged in the main branch and then recalculated summary statistics
[x] I have run enforce_style() to format my code
[x] The documentation copied above is up-to-date
[x] There are no data files in this pull request
[x] None of the file output paths (for the redist_map and redist_plans objects, and summary statistics) have been edited

Additional Notes

Histograms:

Dot plots:

@CoryMcCartan @christopherkenny

mzwu commented 1 year ago

South region summary:

SMC: 25,000 sampled plans of 29 districts on 4,410 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.03

Plan diversity 80% range: 0.43 to 0.98

Sampling diagnostics for SMC run 1 of 2 (12,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     3,590 (28.7%)     22.4%        0.30 7,958 (101%)      8 
Split 2     3,433 (27.5%)     28.5%        0.33 4,841 ( 61%)      6 
Split 3     3,209 (25.7%)     32.2%        0.38 4,846 ( 61%)      5 
Split 4     2,902 (23.2%)     25.7%        0.40 4,833 ( 61%)      6 
Split 5     2,745 (22.0%)     24.8%        0.42 4,865 ( 62%)      6 
Split 6     2,369 (18.9%)     34.4%        0.43 4,671 ( 59%)      4 
Split 7     2,141 (17.1%)     42.0%        0.45 4,757 ( 60%)      3 
Split 8     2,128 (17.0%)     31.9%        0.45 4,843 ( 61%)      4 
Split 9     1,303 (10.4%)     30.5%        0.46 4,791 ( 61%)      4 
Split 10    2,144 (17.2%)     28.8%        0.47 4,790 ( 61%)      4 
Split 11    2,046 (16.4%)     27.4%        0.48 5,002 ( 63%)      4 
Split 12    2,264 (18.1%)     33.5%        0.51 5,024 ( 64%)      3 
Split 13    2,382 (19.1%)     24.6%        0.53 5,196 ( 66%)      4 
Split 14    1,653 (13.2%)     23.1%        0.53 5,319 ( 67%)      4 
Split 15    1,953 (15.6%)     17.6%        0.62 5,078 ( 64%)      5 
Split 16    1,876 (15.0%)     26.5%        0.65 5,176 ( 66%)      3 
Split 17    1,835 (14.7%)     24.7%        0.65 5,135 ( 65%)      3 
Split 18       921 (7.4%)     22.8%        0.67 5,330 ( 67%)      3 
Split 19    1,337 (10.7%)     21.7%        0.70 4,799 ( 61%)      3 
Split 20    1,826 (14.6%)     29.7%        0.70 4,518 ( 57%)      2 
Split 21    1,991 (15.9%)     19.1%        0.69 5,048 ( 64%)      3 
Split 22    2,508 (20.1%)     23.3%        0.67 5,475 ( 69%)      2 
Split 23    2,694 (21.6%)     14.0%        0.72 5,723 ( 72%)      3 
Split 24    2,162 (17.3%)     10.9%        0.73 5,471 ( 69%)      3 
Split 25    2,804 (22.4%)     11.7%        0.73 5,129 ( 65%)      2 
Split 26    2,892 (23.1%)     10.8%        0.69 5,238 ( 66%)      2 
Split 27    3,114 (24.9%)      7.9%        0.72 4,917 ( 62%)      2 
Split 28    2,245 (18.0%)      5.3%        0.76 4,438 ( 56%)      2 
Resample    2,009 (16.1%)       NA%        9.42 4,396 ( 56%)     NA 

Sampling diagnostics for SMC run 2 of 2 (12,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     3,573 (28.6%)     19.6%        0.31 7,911 (100%)      9 
Split 2     3,413 (27.3%)     28.3%        0.33 4,785 ( 61%)      6 
Split 3     3,176 (25.4%)     32.7%        0.38 4,805 ( 61%)      5 
Split 4     2,941 (23.5%)     37.8%        0.41 4,789 ( 61%)      4 
Split 5     2,818 (22.5%)     45.4%        0.42 4,763 ( 60%)      3 
Split 6     2,603 (20.8%)     28.2%        0.43 4,789 ( 61%)      5 
Split 7     2,480 (19.8%)     33.3%        0.44 4,784 ( 61%)      4 
Split 8     2,477 (19.8%)     31.5%        0.45 4,884 ( 62%)      4 
Split 9     2,158 (17.3%)     30.3%        0.47 4,827 ( 61%)      4 
Split 10    2,043 (16.3%)     28.9%        0.49 4,896 ( 62%)      4 
Split 11    1,798 (14.4%)     35.8%        0.50 4,956 ( 63%)      3 
Split 12    1,892 (15.1%)     45.5%        0.50 4,945 ( 63%)      2 
Split 13    2,365 (18.9%)     19.8%        0.52 4,982 ( 63%)      5 
Split 14    2,137 (17.1%)     23.6%        0.53 5,226 ( 66%)      4 
Split 15    1,856 (14.8%)     28.6%        0.62 5,160 ( 65%)      3 
Split 16    2,141 (17.1%)     36.9%        0.62 5,115 ( 65%)      2 
Split 17    2,006 (16.0%)     19.1%        0.67 5,183 ( 66%)      4 
Split 18    2,092 (16.7%)     11.9%        0.67 5,136 ( 65%)      6 
Split 19    1,867 (14.9%)     16.3%        0.73 5,313 ( 67%)      4 
Split 20    1,831 (14.6%)     19.9%        0.73 5,112 ( 65%)      3 
Split 21    2,219 (17.8%)     13.5%        0.72 5,310 ( 67%)      4 
Split 22    2,357 (18.9%)     15.7%        0.71 5,332 ( 67%)      3 
Split 23    2,871 (23.0%)     13.1%        0.74 5,257 ( 67%)      3 
Split 24    2,574 (20.6%)      7.5%        0.72 5,473 ( 69%)      4 
Split 25    1,926 (15.4%)      7.4%        0.72 5,314 ( 67%)      3 
Split 26    2,396 (19.2%)      6.7%        0.74 5,022 ( 64%)      3 
Split 27    1,712 (13.7%)     10.5%        0.80 5,004 ( 63%)      2 
Split 28    1,367 (10.9%)      5.2%        0.78 4,647 ( 59%)      3 
Resample       840 (6.7%)       NA%        9.68 3,501 ( 44%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more than 3
or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.

Bay area summary:

SMC: 25,000 sampled plans of 15 districts on 2,213 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0.05

Plan diversity 80% range: 0.51 to 0.77

Sampling diagnostics for SMC run 1 of 2 (12,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1    10,660 (85.3%)     12.8%        0.45 7,876 (100%)      7 
Split 2    10,175 (81.4%)     14.1%        0.61 7,544 ( 95%)      6 
Split 3    10,015 (80.1%)     20.0%        0.67 7,360 ( 93%)      4 
Split 4     9,956 (79.6%)     24.5%        0.70 7,333 ( 93%)      3 
Split 5     9,786 (78.3%)     23.0%        0.73 7,340 ( 93%)      3 
Split 6     9,677 (77.4%)     16.1%        0.75 7,325 ( 93%)      4 
Split 7     9,342 (74.7%)     19.3%        0.80 7,283 ( 92%)      3 
Split 8     9,339 (74.7%)     25.0%        0.82 7,254 ( 92%)      2 
Split 9     9,617 (76.9%)     15.5%        0.80 7,217 ( 91%)      3 
Split 10    8,010 (64.1%)     19.4%        0.82 7,223 ( 91%)      2 
Split 11    7,803 (62.4%)     17.3%        0.86 6,984 ( 88%)      2 
Split 12    7,467 (59.7%)     15.0%        0.92 6,645 ( 84%)      2 
Split 13    6,841 (54.7%)     15.7%        1.26 5,513 ( 70%)      2 
Split 14    4,097 (32.8%)     21.3%        0.89 5,817 ( 74%)      2 
Resample    1,344 (10.7%)       NA%       10.18 3,589 ( 45%)     NA 

Sampling diagnostics for SMC run 2 of 2 (12,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1    10,623 (85.0%)     14.9%        0.45 7,849 ( 99%)      6 
Split 2    10,211 (81.7%)     20.9%        0.61 7,446 ( 94%)      4 
Split 3    10,033 (80.3%)     25.8%        0.66 7,387 ( 93%)      3 
Split 4     9,765 (78.1%)     33.8%        0.70 7,357 ( 93%)      2 
Split 5     9,708 (77.7%)     22.9%        0.74 7,288 ( 92%)      3 
Split 6     9,646 (77.2%)     21.2%        0.77 7,326 ( 93%)      3 
Split 7     9,426 (75.4%)     14.7%        0.79 7,251 ( 92%)      4 
Split 8     9,485 (75.9%)     13.2%        0.80 7,262 ( 92%)      4 
Split 9     9,401 (75.2%)     15.5%        0.81 7,191 ( 91%)      3 
Split 10    8,947 (71.6%)     19.5%        0.80 7,153 ( 91%)      2 
Split 11    8,610 (68.9%)     11.7%        0.83 7,026 ( 89%)      3 
Split 12    8,278 (66.2%)     14.9%        0.86 6,646 ( 84%)      2 
Split 13    7,229 (57.8%)     13.5%        0.90 6,138 ( 78%)      2 
Split 14    4,816 (38.5%)     19.4%        0.93 5,907 ( 75%)      2 
Resample    2,233 (17.9%)       NA%       10.43 3,674 ( 46%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights (more than 3
or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.

CoryMcCartan commented 1 year ago

Looks great to me!

christopherkenny commented 1 year ago

This looks fabulous! Thanks @mzwu!

alarm-redist / fifty-states