emmaebowe commented 1 year ago

2010 Maryland Congressional Districts

Redistricting requirements

In Maryland, districts must:

be contiguous
have equal populations
be geographically compact
preserve county and municipality boundaries as much as possible
not consider incumbent or partisan information

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%.

Data Sources

Data for Maryland comes from the ALARM Project's 2020 Redistricting Data Files.

Pre-processing Notes

No manual pre-processing decisions were necessary.

Simulation Notes

We sample 5,000 districting plans for Maryland across 2 independent runs of the SMC algorithm. No special techniques were needed to produce the sample.

Validation

validation_20230106_1200

SMC: 5,000 sampled plans of 8 districts on 1,859 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.47 to 0.76

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white      pop_black       pop_hisp 
      1.001665       1.000015       1.001483       1.002804       1.004733       1.003330       1.002716       1.004090 
      pop_aian      pop_asian       pop_nhpi      pop_other        pop_two      vap_white      vap_black       vap_hisp 
      1.003497       1.000721       1.007556       1.013627       1.000273       1.005389       1.002771       1.008020 
      vap_aian      vap_asian       vap_nhpi      vap_other        vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid 
      1.004375       1.000559       1.006385       1.013290       1.000362       1.002651       1.002657       1.003850 
pre_20_rep_tru uss_16_dem_van uss_16_rep_sze uss_18_dem_car uss_18_rep_cam gov_18_dem_jea gov_18_rep_hog atg_18_dem_fro 
      1.001516       1.000491       1.002107       1.002184       1.001577       1.000952       1.010146       1.000936 
atg_18_rep_wol         adv_16         adv_18         adv_20         arv_16         arv_18         arv_20  county_splits 
      1.002116       1.001905       1.001189       1.003850       1.002075       1.003087       1.001516       1.008821 
   muni_splits            ndv            nrv        ndshare        e_dvs.x       pr_dem.x        e_dem.x        pbias.x 
      1.017111       1.001623       1.001425       1.001166       1.001165       1.000350       1.002592       1.001408 
        egap.x        e_dvs.y       pr_dem.y        e_dem.y        pbias.y         egap.y 
      1.002269       1.001165       1.000350       1.002592       1.001408       1.002269 

Sampling diagnostics for SMC run 1 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,456 (98.2%)      8.0%        0.27 1,577 (100%)     10 
Split 2     2,420 (96.8%)     12.6%        0.37 1,587 (100%)      6 
Split 3     2,404 (96.2%)     17.0%        0.40 1,585 (100%)      4 
Split 4     2,367 (94.7%)     20.0%        0.47 1,536 ( 97%)      3 
Split 5     2,290 (91.6%)     17.4%        0.55 1,557 ( 99%)      3 
Split 6     2,310 (92.4%)     19.1%        0.54 1,468 ( 93%)      2 
Split 7     2,265 (90.6%)      6.5%        0.54 1,336 ( 85%)      2 
Resample    1,397 (55.9%)       NA%        0.56 1,414 ( 89%)     NA 

Sampling diagnostics for SMC run 2 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,454 (98.2%)     12.9%        0.27 1,584 (100%)      6 
Split 2     2,415 (96.6%)     15.3%        0.38 1,593 (101%)      5 
Split 3     2,392 (95.7%)     13.5%        0.42 1,579 (100%)      5 
Split 4     2,360 (94.4%)     19.4%        0.48 1,541 ( 98%)      3 
Split 5     2,327 (93.1%)     24.8%        0.52 1,553 ( 98%)      2 
Split 6     2,285 (91.4%)      8.2%        0.57 1,497 ( 95%)      5 
Split 7     2,280 (91.2%)      4.8%        0.57 1,381 ( 87%)      3 
Resample    1,655 (66.2%)       NA%        0.58 1,443 ( 91%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std. devs. of the log weights
(more than 3 or so), and low numbers of unique plans. R-hat values for summary statistics should be between 1 and 1.05.

Checklist

[x] I have followed the instructions
[x] I have updated the tracker
[x] All TODO lines from the template code have been removed
[x] I have merged in the master branch and then recalculated summary statistics
[x] I have run enforce_style() to format my code
[x] The documentation copied above is up-to-date
[x] There are no data files in this pull request
[x] None of the file output paths (for the redist_map and redist_plans objects, and summary statistics) have been edited

@CoryMcCartan @christopherkenny @tylersimko

christopherkenny commented 1 year ago

@emmaebowe, it looks like there are some code/github issues:

there are a bunch of other files, but none for MD 2010 here. Can you revise the code so that can be reviewed too?
the summary shows some duplicated columns with names (*.x and *.y). Do you know what happened there?
the documentation is missing in the PR

The substance (summary stats, plans, and rhats) all look good to me! Thanks

emmaebowe commented 1 year ago

@christopherkenny File issues should be resolved now! I got a warning about GEOID10 when I ran the code, could that have caused the summary stats duplication?

emmaebowe commented 1 year ago

Summary stat duplication should be resolved!

christopherkenny commented 1 year ago

Great, can we just get a new rhat (i.e. summary(plans)) with that removed then! I think we're just about good to merge.

emmaebowe commented 1 year ago

✔ Saving <redist_plans> object ... done
SMC: 5,000 sampled plans of 8 districts on 1,859 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0
ℹ Preparing MD shapefile
Plan diversity 80% range: 0.46 to 0.77
ℹ Preparing MD shapefile
R-hat values for summary statistics:
pop_overlap 
   1.010844 

Sampling diagnostics for SMC run 1 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,454 (98.2%)      7.8%        0.27 1,545 ( 98%)     10 
Split 2     2,417 (96.7%)     12.4%        0.37 1,540 ( 97%)      6 
Split 3     2,390 (95.6%)     17.0%        0.42 1,591 (101%)      4 
Split 4     2,369 (94.7%)     19.9%        0.47 1,551 ( 98%)      3 
Split 5     2,348 (93.9%)     17.7%        0.49 1,529 ( 97%)      3 
Split 6     2,337 (93.5%)     19.6%        0.51 1,472 ( 93%)      2 
Split 7     2,316 (92.6%)      5.0%        0.53 1,353 ( 86%)      3 
Resample    1,780 (71.2%)       NA%        0.53 1,452 ( 92%)     NA 

Sampling diagnostics for SMC run 2 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,455 (98.2%)     12.4%        0.27 1,587 (100%)      6 
Split 2     2,412 (96.5%)     14.6%        0.38 1,590 (101%)      5 
Split 3     2,387 (95.5%)     22.2%        0.43 1,535 ( 97%)      3 
Split 4     2,371 (94.8%)     27.9%        0.47 1,533 ( 97%)      2 
Split 5     2,345 (93.8%)     24.5%        0.49 1,542 ( 98%)      2 
Split 6     2,253 (90.1%)     20.0%        0.59 1,484 ( 94%)      2 
Split 7     2,170 (86.8%)      7.0%        0.68 1,306 ( 83%)      2 
Resample    1,341 (53.6%)       NA%        0.68 1,331 ( 84%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large
std. devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat
values for summary statistics should be between 1 and 1.05.
ℹ Preparing MD shapefile

christopherkenny commented 1 year ago

Looks like we are just about there. Can we just confirm that the other values are in the plans object? All I see is:

R-hat values for summary statistics:
pop_overlap
1.010844

emmaebowe commented 1 year ago

Apologies, here's the full thing!

✔ Preparing MD shapefile ... done
SMC: 5,000 sampled plans of 8 districts on 1,859 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0
ℹ Preparing MD shapefile
Plan diversity 80% range: 0.50 to 0.76
ℹ Preparing MD shapefile
R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby      pop_white 
     1.0013274      1.0159530      1.0001856      1.0037702      1.0023219      1.0003998 
     pop_black       pop_hisp       pop_aian      pop_asian       pop_nhpi      pop_other 
     1.0019676      1.0030604      1.0025138      1.0021432      1.0023543      1.0033762 
       pop_two      vap_white      vap_black       vap_hisp       vap_aian      vap_asian 
     1.0085428      1.0020536      1.0037418      1.0024889      1.0018992      1.0021807 
      vap_nhpi      vap_other        vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid 
     1.0006373      1.0007845      1.0051680      1.0030311      1.0009814      0.9998664 
pre_20_rep_tru uss_16_dem_van uss_16_rep_sze uss_18_dem_car uss_18_rep_cam gov_18_dem_jea 
     1.0009971      0.9999793      1.0009498      1.0002865      1.0000787      1.0000215 
gov_18_rep_hog atg_18_dem_fro atg_18_rep_wol         adv_16         adv_18         adv_20 
     1.0000685      1.0016293      1.0003660      1.0012809      1.0020177      0.9998664 
        arv_16         arv_18         arv_20  county_splits    muni_splits            ndv 
     1.0031642      1.0002696      1.0009971      0.9998133      1.0228440      1.0014524 
           nrv        ndshare          e_dvs         pr_dem          e_dem          pbias 
     1.0006835      1.0008000      1.0018574      1.0001566      1.0016806      1.0016444 
          egap 
     1.0025664 

Sampling diagnostics for SMC run 1 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,455 (98.2%)      8.0%        0.27 1,569 ( 99%)     10 
Split 2     2,415 (96.6%)     12.2%        0.38 1,588 (100%)      6 
Split 3     2,393 (95.7%)     17.2%        0.41 1,557 ( 99%)      4 
Split 4     2,366 (94.6%)     19.7%        0.47 1,569 ( 99%)      3 
Split 5     2,335 (93.4%)     13.5%        0.52 1,520 ( 96%)      4 
Split 6     2,292 (91.7%)     14.4%        0.56 1,463 ( 93%)      3 
Split 7     2,280 (91.2%)      7.0%        0.57 1,323 ( 84%)      2 
Resample    1,692 (67.7%)       NA%        0.58 1,428 ( 90%)     NA 

Sampling diagnostics for SMC run 2 of 2 (2,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     2,454 (98.1%)     13.2%        0.27 1,592 (101%)      6 
Split 2     2,423 (96.9%)     18.6%        0.36 1,578 (100%)      4 
Split 3     2,402 (96.1%)     16.8%        0.40 1,575 (100%)      4 
Split 4     2,376 (95.0%)     12.1%        0.47 1,555 ( 98%)      5 
Split 5     2,339 (93.6%)     12.8%        0.50 1,552 ( 98%)      4 
Split 6     2,306 (92.3%)     13.3%        0.55 1,489 ( 94%)      3 
Split 7     2,307 (92.3%)      4.8%        0.53 1,383 ( 88%)      3 
Resample    1,742 (69.7%)       NA%        0.55 1,461 ( 92%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%), large std.
devs. of the log weights (more than 3 or so), and low numbers of unique plans. R-hat values
for summary statistics should be between 1 and 1.05.
ℹ Preparing MD shapefile

alarm-redist / fifty-states