alarm-redist / fifty-states

Redistricting analysis for all 50 U.S. states
https://alarm-redist.github.io/fifty-states/
Other
9 stars 7 forks source link

Re-run 2020 California Congressional Districts #128

Closed christopherkenny closed 2 years ago

christopherkenny commented 2 years ago

Redistricting requirements

In California, under Article XXI, districts must:

  1. be contiguous (2d3)
  2. have equal populations (2d1)
  3. be geographically compact (2d5)
  4. preserve city, county, neighborhood, and community of interest boundaries as much as possible (2d4)
  5. not favor or discriminate against incumbents, candidates, or parties (2e)
  6. comply with the Voting Rights Act (2d2)

Interpretation of requirements

We enforce a maximum population deviation of 0.5%. We add a pseudo-county constraint, as described below. We add VRA constraints encouraging Hispanic VAP and Asian VAP majorities in districts.

Data Sources

Data for California comes from the ALARM Project's 2020 Redistricting Data Files.

Pre-processing Notes

Islands were connected to their nearest point within county on the mainland.

Simulation Notes

We sample 20,000 districting plans in each cluster across 2 indpenednet runs of the SMC algorithm. We next sample 40,000 districting plans for California across 2 independent runs of the SMC algorithm for the remainder. We then thin the sample to down to 5,000 plans. To balance county and municipality splits, we create pseudocounties for use in the county constraint. These are counties are Alameda County, Contra Costa County, Fresno County, Kern County, Los Angeles County, Orange County, Riverside County, Sacramento County, San Bernardino County, San Diego County, San Francisco County, San Joaquin County, San Mateo County, Santa Clara County, and Ventura County, which are larger than a congressional district in population. A small population tempering value was used for each cluster to avoid losing diversity at the final step based on initial runs.

1. Clustering Procedure

First, we run partial SMC in two pieces: the south and the Bay Area. The counties in each cluster are:

We sample in each of these regions with a population deviation of 0.5%. We sample 27 districts in the southern region and 15 districts in the Bay Area. Because each cluster will have leftover population, we apply an additional constraint that incentivizes leaving any unassigned areas on the edge of these clusters to avoid discontiguities. For each cluster, we add VRA constraints encouraging Hispanic VAP and Asian VAP concentrations in districts, in line with the enacted plan.

2. Combination Procedure

Then, these partial map simulations are combined to run statewide simulations. We sample 10 districts in the remainder.

Validation

validation_20220715_2103

SMC: 40,000 sampled plans of 52 districts on 9,129 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.39 to 0.87

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp 
     1.0206726      1.0006993      1.0003950      1.0031552      1.0063495      1.0042640 
     pop_white      pop_black       pop_aian      pop_asian       pop_nhpi      pop_other 
     1.0008380      1.0003189      0.9999905      1.0015601      1.0032105      1.0020453 
       pop_two       vap_hisp      vap_white      vap_black       vap_aian      vap_asian 
     1.0008742      1.0047342      1.0007322      1.0006367      1.0000615      1.0017646 
      vap_nhpi      vap_other        vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid 
     1.0026712      1.0005083      1.0001460      1.0039224      1.0002809      1.0056975 
pre_20_rep_tru         arv_16         adv_16         arv_20         adv_20  county_splits 
     1.0049327      1.0002809      1.0039224      1.0049327      1.0056975      1.0014315 
   muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem 
     1.0049442      1.0055980      1.0059087      1.0015216      1.0016879      1.0015121 
         e_dem          pbias           egap 
     1.0205089      1.0028718      1.0144812 

Sampling diagnostics for SMC run 1 of 2 (20,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    15,843 (79.2%)     16.1%        0.64 11,775 ( 93%)      5 
Split 2    15,711 (78.6%)     15.5%        0.69 11,703 ( 93%)      4 
Split 3    16,245 (81.2%)     17.9%        0.68 11,672 ( 92%)      3 
Split 4    16,242 (81.2%)     15.8%        0.67 11,538 ( 91%)      3 
Split 5    16,190 (81.0%)     19.3%        0.67 11,564 ( 91%)      2 
Split 6    16,623 (83.1%)      7.1%        0.65 11,392 ( 90%)      5 
Split 7    16,739 (83.7%)      9.0%        0.63 11,074 ( 88%)      3 
Split 8    16,659 (83.3%)      6.9%        0.63 10,806 ( 85%)      3 
Split 9    15,981 (79.9%)      3.7%        0.68 10,003 ( 79%)      2 
Resample    7,662 (38.3%)       NA%        0.68 10,645 ( 84%)     NA 

Sampling diagnostics for SMC run 2 of 2 (20,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    15,774 (78.9%)     13.6%        0.64 11,889 ( 94%)      6 
Split 2    15,540 (77.7%)     15.5%        0.70 11,719 ( 93%)      4 
Split 3    16,016 (80.1%)     17.8%        0.70 11,633 ( 92%)      3 
Split 4    16,169 (80.8%)      8.0%        0.68 11,650 ( 92%)      6 
Split 5    16,436 (82.2%)     10.3%        0.66 11,517 ( 91%)      4 
Split 6    16,317 (81.6%)      8.7%        0.66 11,306 ( 89%)      4 
Split 7    16,723 (83.6%)      9.3%        0.64 11,088 ( 88%)      3 
Split 8    16,812 (84.1%)     10.3%        0.63 10,889 ( 86%)      2 
Split 9    16,707 (83.5%)      1.9%        0.63 10,165 ( 80%)      4 
Resample   10,884 (54.4%)       NA%        0.63 11,055 ( 87%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std.
devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat values
for summary statistics should be between 1 and 1.05.

Checklist

@CoryMcCartan

Additional Notes:

Summary for south region:

SMC: 20,000 sampled plans of 28 districts on 4,873 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.005

Plan diversity 80% range: 0.38 to 0.99

Sampling diagnostics for SMC run 1 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     3,168 (31.7%)     17.3%        0.28 6,395 (101%)     11 
Split 2     2,928 (29.3%)     26.2%        0.31 3,843 ( 61%)      7 
Split 3     2,776 (27.8%)     34.6%        0.37 3,713 ( 59%)      5 
Split 4     2,580 (25.8%)     32.9%        0.38 3,685 ( 58%)      5 
Split 5     2,457 (24.6%)     31.5%        0.39 3,583 ( 57%)      5 
Split 6     2,267 (22.7%)     29.9%        0.40 3,609 ( 57%)      5 
Split 7     2,145 (21.5%)     28.2%        0.42 3,615 ( 57%)      5 
Split 8     2,039 (20.4%)     32.6%        0.43 3,550 ( 56%)      4 
Split 9     1,816 (18.2%)     31.8%        0.45 3,471 ( 55%)      4 
Split 10    1,545 (15.4%)     37.5%        0.48 3,444 ( 54%)      3 
Split 11    1,510 (15.1%)     47.6%        0.49 3,395 ( 54%)      2 
Split 12    1,547 (15.5%)     26.5%        0.51 3,426 ( 54%)      4 
Split 13    1,613 (16.1%)     32.5%        0.53 3,518 ( 56%)      3 
Split 14    1,228 (12.3%)     41.3%        0.53 3,666 ( 58%)      2 
Split 15    1,176 (11.8%)     40.3%        0.63 3,422 ( 54%)      2 
Split 16       742 (7.4%)     37.9%        0.63 3,500 ( 55%)      2 
Split 17    1,097 (11.0%)     35.0%        0.62 3,549 ( 56%)      2 
Split 18    1,135 (11.4%)     24.4%        0.65 3,527 ( 56%)      3 
Split 19    1,245 (12.5%)     30.7%        0.63 3,551 ( 56%)      2 
Split 20       616 (6.2%)     28.2%        0.63 3,732 ( 59%)      2 
Split 21       825 (8.2%)     25.8%        0.66 3,550 ( 56%)      2 
Split 22       781 (7.8%)      9.6%        0.68 3,677 ( 58%)      5 
Split 23    1,084 (10.8%)     14.0%        0.70 3,759 ( 59%)      3 
Split 24       897 (9.0%)      9.2%        0.73 3,539 ( 56%)      4 
Split 25    1,092 (10.9%)      9.3%        0.73 3,687 ( 58%)      3 
Split 26    1,315 (13.1%)     17.9%        0.64 3,482 ( 55%)      2 
Split 27    1,525 (15.2%)     10.6%        0.66 2,779 ( 44%)      2 
Resample    1,497 (15.0%)       NA%       10.00 3,394 ( 54%)     NA 

Sampling diagnostics for SMC run 2 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     3,169 (31.7%)     19.0%        0.28 6,348 (100%)     10 
Split 2     3,041 (30.4%)     22.6%        0.31 3,798 ( 60%)      8 
Split 3     2,821 (28.2%)     33.7%        0.37 3,796 ( 60%)      5 
Split 4     2,606 (26.1%)     24.0%        0.38 3,680 ( 58%)      7 
Split 5     2,463 (24.6%)     37.6%        0.39 3,616 ( 57%)      4 
Split 6     2,342 (23.4%)     36.5%        0.40 3,517 ( 56%)      4 
Split 7     2,116 (21.2%)     28.7%        0.41 3,506 ( 55%)      5 
Split 8     2,046 (20.5%)     33.4%        0.43 3,536 ( 56%)      4 
Split 9     1,767 (17.7%)     39.3%        0.45 3,555 ( 56%)      3 
Split 10    1,816 (18.2%)     29.7%        0.45 3,470 ( 55%)      4 
Split 11    1,681 (16.8%)     36.9%        0.48 3,572 ( 57%)      3 
Split 12    1,635 (16.4%)     26.7%        0.50 3,535 ( 56%)      4 
Split 13    1,663 (16.6%)     32.7%        0.51 3,541 ( 56%)      3 
Split 14    1,426 (14.3%)     23.9%        0.52 3,560 ( 56%)      4 
Split 15    1,225 (12.3%)     29.3%        0.61 3,510 ( 56%)      3 
Split 16    1,377 (13.8%)     37.6%        0.65 3,514 ( 56%)      2 
Split 17    1,259 (12.6%)     25.5%        0.65 3,661 ( 58%)      3 
Split 18    1,398 (14.0%)     18.1%        0.66 3,730 ( 59%)      4 
Split 19    1,389 (13.9%)     21.9%        0.82 3,666 ( 58%)      3 
Split 20    1,208 (12.1%)     27.2%        0.71 3,544 ( 56%)      2 
Split 21    1,289 (12.9%)     23.5%        0.70 3,690 ( 58%)      2 
Split 22    1,402 (14.0%)     20.7%        0.69 3,858 ( 61%)      2 
Split 23    1,270 (12.7%)      9.8%        0.73 3,753 ( 59%)      4 
Split 24    2,042 (20.4%)     11.7%        0.81 3,710 ( 59%)      3 
Split 25    1,158 (11.6%)      8.6%        0.74 3,530 ( 56%)      2 
Split 26    1,517 (15.2%)      8.0%        0.71 3,171 ( 50%)      2 
Split 27    1,074 (10.7%)      4.3%        0.70 2,921 ( 46%)      2 
Resample    1,161 (11.6%)       NA%        9.93 3,199 ( 51%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std.
devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat values
for summary statistics should be between 1 and 1.05.

Summary for Bay region:

SMC: 20,000 sampled plans of 16 districts on 2,891 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0.0025

Plan diversity 80% range: 0.23 to 0.83

Sampling diagnostics for SMC run 1 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k    
Split 1     2,538 (25.4%)     13.1%        0.37 6,340 (100%)      8    
Split 2     3,665 (36.6%)     20.7%        1.28 3,788 ( 60%)      5    
Split 3     2,724 (27.2%)     24.6%        1.18 4,450 ( 70%)      4    
Split 4     2,330 (23.3%)     23.0%        1.32 4,393 ( 69%)      4    
Split 5     2,346 (23.5%)     21.8%        1.35 4,251 ( 67%)      4    
Split 6     2,574 (25.7%)     20.4%        1.34 4,357 ( 69%)      4    
Split 7     3,002 (30.0%)     24.2%        1.30 4,680 ( 74%)      3    
Split 8     3,043 (30.4%)     30.5%        1.20 4,805 ( 76%)      2    
Split 9     4,262 (42.6%)     28.0%        1.20 4,982 ( 79%)      2    
Split 10    4,223 (42.2%)     26.2%        1.02 5,077 ( 80%)      2    
Split 11    3,356 (33.6%)     16.8%        1.05 5,151 ( 81%)      3    
Split 12    3,091 (30.9%)     21.9%        1.18 4,954 ( 78%)      2    
Split 13    4,464 (44.6%)     19.7%        1.10 4,748 ( 75%)      2    
Split 14    4,414 (44.1%)     11.2%        0.98 4,875 ( 77%)      3    
Split 15    2,819 (28.2%)     15.5%        0.99 4,569 ( 72%)      2    
Resample        49 (0.5%)       NA%        8.82 2,424 ( 38%)     NA  * 

Sampling diagnostics for SMC run 2 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,595 (26.0%)     13.1%        0.37 6,342 (100%)      8 
Split 2     3,678 (36.8%)     16.8%        1.26 3,786 ( 60%)      6 
Split 3     2,596 (26.0%)     24.6%        1.16 4,515 ( 71%)      4 
Split 4     2,244 (22.4%)     30.1%        1.31 4,450 ( 70%)      3 
Split 5     2,517 (25.2%)     14.5%        1.38 4,168 ( 66%)      6 
Split 6     3,420 (34.2%)     20.3%        1.34 4,332 ( 69%)      4 
Split 7     3,362 (33.6%)     24.2%        1.16 4,881 ( 77%)      3 
Split 8     3,004 (30.0%)     22.1%        1.18 4,965 ( 79%)      3 
Split 9     3,130 (31.3%)     28.0%        1.24 4,968 ( 79%)      2 
Split 10    3,604 (36.0%)     26.2%        1.22 4,906 ( 78%)      2 
Split 11    4,439 (44.4%)     16.3%        1.12 4,953 ( 78%)      3 
Split 12    4,836 (48.4%)     20.7%        0.99 5,102 ( 81%)      2 
Split 13    4,535 (45.3%)     13.0%        1.00 5,070 ( 80%)      3 
Split 14    4,420 (44.2%)     17.6%        1.01 4,920 ( 78%)      2 
Split 15    5,348 (53.5%)     17.0%        0.91 4,842 ( 77%)      2 
Resample    1,505 (15.1%)       NA%        7.80 3,884 ( 61%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std.
devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat values
for summary statistics should be between 1 and 1.05.
• (*) Bottlenecks found: Consider weakening or removing constraints, or increasing the
population tolerance. If the accpetance rate drops quickly in the final splits, try
increasing `pop_temper` by 0.01. To visualize what geographic areas may be causing problems,
try running the following code. Highlighted areas are those that may be causing the
bottleneck.
    plot(<map object>, rowMeans(as.matrix(plans_bay) == <bottleneck iteration>))

image (6) image (7)

CoryMcCartan commented 2 years ago

Can you paste the overall summary in the marked spot?

Also, could you show me a couple more random plans? Just want to spot-check the diversity, esp. in SoCal. I also notice that the E-most big district connects to the SoCal cluster the same way in every example map (but the enacted map doesn't do this). Are we sure that the cluster-based sim process could yield the enacted plan? image

christopherkenny commented 2 years ago

Yeah, just added the summary. Will get you more sample plans today

christopherkenny commented 2 years ago

Here are six more. I think that we may need to run a bit more for the south? image image

CoryMcCartan commented 2 years ago

Hm you see what I mean about the S-central connection all being through that one precicnt? Any way to dig into that — is it the adj graph or the boundary check?

christopherkenny commented 2 years ago

Oh, I hope it's the adjacency! I have a seam_sew function that never gets any use for this problem. Will look into it.

christopherkenny commented 2 years ago

Okay, making the clusters a bit smaller looks hopeful @CoryMcCartan.

Validation: validation_20220725_1637

Bay region summary:

SMC: 20,000 sampled plans of 14 districts on 2,532 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0.0025

Plan diversity 80% range: 0.62 to 0.88

Sampling diagnostics for SMC run 1 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,216 (42.2%)     10.8%        0.46 6,334 (100%)      9 
Split 2     3,842 (38.4%)     15.6%        0.99 4,598 ( 73%)      6 
Split 3     3,655 (36.5%)     22.2%        1.09 4,517 ( 71%)      4 
Split 4     4,326 (43.3%)     20.8%        1.09 4,571 ( 72%)      4 
Split 5     4,733 (47.3%)     19.5%        1.01 4,961 ( 78%)      4 
Split 6     4,637 (46.4%)     23.3%        0.97 5,163 ( 82%)      3 
Split 7     5,025 (50.2%)     21.1%        0.99 5,258 ( 83%)      3 
Split 8     5,147 (51.5%)     19.5%        0.99 5,269 ( 83%)      3 
Split 9     4,243 (42.4%)     25.0%        0.99 5,285 ( 84%)      2 
Split 10    4,274 (42.7%)     11.7%        1.04 5,136 ( 81%)      4 
Split 11    4,276 (42.8%)     13.8%        1.07 4,903 ( 78%)      3 
Split 12    4,527 (45.3%)     17.5%        0.99 4,771 ( 75%)      2 
Split 13    4,499 (45.0%)     14.7%        1.02 4,730 ( 75%)      2 
Resample    1,124 (11.2%)       NA%        8.69 3,129 ( 50%)     NA 

Sampling diagnostics for SMC run 2 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     4,173 (41.7%)     12.3%        0.46 6,335 (100%)      8 
Split 2     3,944 (39.4%)     18.5%        1.01 4,721 ( 75%)      5 
Split 3     3,746 (37.5%)     21.6%        1.08 4,512 ( 71%)      4 
Split 4     4,134 (41.3%)     27.4%        1.09 4,647 ( 74%)      3 
Split 5     4,520 (45.2%)     34.8%        1.05 4,988 ( 79%)      2 
Split 6     4,590 (45.9%)     22.9%        1.02 5,093 ( 81%)      3 
Split 7     4,951 (49.5%)     15.9%        1.01 5,262 ( 83%)      4 
Split 8     5,304 (53.0%)     18.8%        0.98 5,263 ( 83%)      3 
Split 9     5,212 (52.1%)     24.1%        0.94 5,327 ( 84%)      2 
Split 10    5,244 (52.4%)     14.9%        0.96 5,246 ( 83%)      3 
Split 11    4,304 (43.0%)     13.4%        0.97 5,160 ( 82%)      3 
Split 12    4,813 (48.1%)     17.0%        1.05 4,934 ( 78%)      2 
Split 13    4,053 (40.5%)     10.3%        1.00 4,695 ( 74%)      3 
Resample       579 (5.8%)       NA%        8.93 3,115 ( 49%)     NA

South region summary:

SMC: 20,000 sampled plans of 28 districts on 4,873 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.95
`est_label_mult`=1 • `pop_temper`=0.005

Plan diversity 80% range: 0.78 to 0.95

Sampling diagnostics for SMC run 1 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k    
Split 1     3,396 (34.0%)     17.3%        0.29 6,315 (100%)     11    
Split 2     3,337 (33.4%)     25.9%        0.31 4,105 ( 65%)      7    
Split 3     3,117 (31.2%)     34.4%        0.36 4,005 ( 63%)      5    
Split 4     2,970 (29.7%)     39.7%        0.38 3,926 ( 62%)      4    
Split 5     2,889 (28.9%)     47.8%        0.39 3,935 ( 62%)      3    
Split 6     2,620 (26.2%)     29.7%        0.40 3,882 ( 61%)      5    
Split 7     2,495 (25.0%)     34.3%        0.42 3,874 ( 61%)      4    
Split 8     2,444 (24.4%)     41.3%        0.43 3,837 ( 61%)      3    
Split 9     2,281 (22.8%)     39.9%        0.46 3,830 ( 61%)      3    
Split 10    2,064 (20.6%)     49.9%        0.48 3,828 ( 61%)      2    
Split 11    1,904 (19.0%)     47.9%        0.49 3,785 ( 60%)      2    
Split 12    2,008 (20.1%)     21.6%        0.50 3,744 ( 59%)      5    
Split 13    2,035 (20.3%)     25.5%        0.53 3,897 ( 62%)      4    
Split 14    1,926 (19.3%)     24.2%        0.55 3,839 ( 61%)      4    
Split 15    1,447 (14.5%)     29.5%        0.62 3,871 ( 61%)      3    
Split 16    1,586 (15.9%)     27.5%        0.65 3,769 ( 60%)      3    
Split 17    1,494 (14.9%)     25.8%        0.65 3,808 ( 60%)      3    
Split 18    1,585 (15.8%)     32.9%        0.67 3,798 ( 60%)      2    
Split 19    1,626 (16.3%)     30.4%        0.64 3,998 ( 63%)      2    
Split 20    1,513 (15.1%)     26.7%        0.64 4,021 ( 64%)      2    
Split 21    1,448 (14.5%)     16.4%        0.66 4,098 ( 65%)      3    
Split 22      998 (10.0%)     14.8%        0.67 3,951 ( 63%)      3    
Split 23    1,993 (19.9%)     19.5%        0.69 3,741 ( 59%)      2    
Split 24    1,739 (17.4%)     15.6%        0.75 4,113 ( 65%)      2    
Split 25    1,433 (14.3%)      7.5%        0.85 3,651 ( 58%)      4    
Split 26    1,449 (14.5%)      7.4%        0.74 3,791 ( 60%)      3    
Split 27       785 (7.9%)      6.3%        0.66 2,780 ( 44%)      2    
Resample       430 (4.3%)       NA%        9.78 3,395 ( 54%)     NA  * 

Sampling diagnostics for SMC run 2 of 2 (10,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     3,455 (34.5%)     18.8%        0.29 6,333 (100%)     10 
Split 2     3,304 (33.0%)     26.4%        0.31 4,054 ( 64%)      7 
Split 3     3,176 (31.8%)     29.0%        0.37 4,035 ( 64%)      6 
Split 4     2,988 (29.9%)     27.5%        0.38 4,039 ( 64%)      6 
Split 5     2,822 (28.2%)     38.4%        0.40 3,914 ( 62%)      4 
Split 6     2,597 (26.0%)     35.9%        0.41 3,898 ( 62%)      4 
Split 7     2,489 (24.9%)     44.0%        0.42 3,832 ( 61%)      3 
Split 8     2,248 (22.5%)     41.2%        0.44 3,798 ( 60%)      3 
Split 9     2,234 (22.3%)     31.4%        0.46 3,836 ( 61%)      4 
Split 10    2,013 (20.1%)     38.3%        0.47 3,755 ( 59%)      3 
Split 11    1,707 (17.1%)     36.0%        0.48 3,873 ( 61%)      3 
Split 12    1,718 (17.2%)     45.4%        0.50 3,842 ( 61%)      2 
Split 13    1,666 (16.7%)     43.5%        0.50 3,852 ( 61%)      2 
Split 14    1,786 (17.9%)     30.8%        0.51 3,896 ( 62%)      3 
Split 15    1,355 (13.6%)     38.9%        0.61 3,983 ( 63%)      2 
Split 16    1,604 (16.0%)     26.2%        0.62 3,963 ( 63%)      3 
Split 17    1,811 (18.1%)     24.8%        0.63 3,881 ( 61%)      3 
Split 18    1,162 (11.6%)     22.6%        0.65 4,035 ( 64%)      3 
Split 19    1,142 (11.4%)     29.4%        0.65 3,706 ( 59%)      2 
Split 20    1,532 (15.3%)     11.9%        0.64 3,782 ( 60%)      5 
Split 21    1,047 (10.5%)     12.4%        0.63 4,051 ( 64%)      4 
Split 22    1,082 (10.8%)     13.9%        0.70 3,755 ( 59%)      3 
Split 23    1,315 (13.1%)     10.3%        0.73 3,472 ( 55%)      4 
Split 24    1,950 (19.5%)     12.1%        0.74 3,910 ( 62%)      3 
Split 25    2,052 (20.5%)     14.7%        0.69 3,897 ( 62%)      2 
Split 26    2,162 (21.6%)     13.9%        0.73 3,879 ( 61%)      2 
Split 27    1,640 (16.4%)      5.8%        0.85 2,973 ( 47%)      2 
Resample    1,823 (18.2%)       NA%       10.07 3,088 ( 49%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large
std. devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat
values for summary statistics should be between 1 and 1.05.
• (*) Bottlenecks found: Consider weakening or removing constraints, or increasing the
population tolerance. If the accpetance rate drops quickly in the final splits, try
increasing `pop_temper` by 0.01. To visualize what geographic areas may be causing
problems, try running the following code. Highlighted areas are those that may be causing
the bottleneck.
    plot(<map object>, rowMeans(as.matrix(plans_south) == <bottleneck iteration>))

Remainder summary:

SMC: 5,000 sampled plans of 52 districts on 9,129 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.42 to 0.86

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby       pop_hisp 
     1.0045232      1.0112384      1.0003201      1.0175013      1.0146713      1.0091355 
     pop_white      pop_black       pop_aian      pop_asian       pop_nhpi      pop_other 
     1.0069664      1.0185153      1.0057457      1.0100283      1.0234747      1.0079842 
       pop_two       vap_hisp      vap_white      vap_black       vap_aian      vap_asian 
     0.9998509      1.0085254      1.0100374      1.0159567      1.0064864      1.0115489 
      vap_nhpi      vap_other        vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid 
     1.0244848      1.0052043      1.0028274      1.0144599      1.0145071      1.0014293 
pre_20_rep_tru         arv_16         adv_16         arv_20         adv_20  county_splits 
     1.0287086      1.0145071      1.0144599      1.0287086      1.0014293      1.0102642 
   muni_splits            ndv            nrv        ndshare          e_dvs         pr_dem 
     1.0136140      1.0082826      1.0498364      1.0044934      1.0046764      1.0086745 
         e_dem          pbias           egap 
     1.0008221      1.0041871      0.9999721 

Sampling diagnostics for SMC run 1 of 2 (20,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    15,300 (76.5%)     13.7%        0.70 12,041 ( 95%)      6 
Split 2    15,206 (76.0%)     13.4%        0.75 11,755 ( 93%)      5 
Split 3    15,514 (77.6%)     14.8%        0.74 11,692 ( 92%)      4 
Split 4    16,106 (80.5%)     13.6%        0.72 11,508 ( 91%)      4 
Split 5    16,234 (81.2%)     16.6%        0.69 11,572 ( 92%)      3 
Split 6    16,316 (81.6%)     14.8%        0.67 11,589 ( 92%)      3 
Split 7    16,236 (81.2%)     10.0%        0.68 11,489 ( 91%)      4 
Split 8    15,919 (79.6%)     11.2%        0.69 11,268 ( 89%)      3 
Split 9    16,306 (81.5%)     12.8%        0.68 11,110 ( 88%)      2 
Split 10   16,530 (82.6%)      6.6%        0.66 10,823 ( 86%)      3 
Split 11   16,146 (80.7%)      2.5%        0.67  9,975 ( 79%)      3 
Resample    8,262 (41.3%)       NA%        0.67 10,749 ( 85%)     NA 

Sampling diagnostics for SMC run 2 of 2 (20,000 samples)
         Eff. samples (%) Acc. rate Log wgt. sd   Max. unique Est. k 
Split 1    15,271 (76.4%)     16.6%        0.70 12,062 ( 95%)      5 
Split 2    15,274 (76.4%)     11.0%        0.75 11,770 ( 93%)      6 
Split 3    15,370 (76.8%)     11.7%        0.76 11,638 ( 92%)      5 
Split 4    15,557 (77.8%)     13.7%        0.75 11,554 ( 91%)      4 
Split 5    15,885 (79.4%)     12.6%        0.72 11,492 ( 91%)      4 
Split 6    16,349 (81.7%)     11.3%        0.69 11,525 ( 91%)      4 
Split 7    16,490 (82.5%)     13.4%        0.67 11,441 ( 90%)      3 
Split 8    16,489 (82.4%)     11.2%        0.66 11,491 ( 91%)      3 
Split 9    16,584 (82.9%)      6.6%        0.67 11,030 ( 87%)      4 
Split 10   16,760 (83.8%)      6.6%        0.65 10,725 ( 85%)      3 
Split 11   16,416 (82.1%)      2.5%        0.67 10,167 ( 80%)      3 
Resample   10,964 (54.8%)       NA%        0.67 10,935 ( 86%)     NA 

Additional validation plots: Checking counts: image

Dots: image

@CoryMcCartan