alarm-redist / fifty-states

Redistricting analysis for all 50 U.S. states
https://alarm-redist.github.io/fifty-states/
Other
9 stars 7 forks source link

2010 Rhode Island Congressional Districts #172

Closed mzhao80 closed 1 year ago

mzhao80 commented 1 year ago

Redistricting requirements

In Rhode Island, according to Chapter 106, Section 2 of the 2011 Rhode Island Laws, districts must:

  1. be contiguous
  2. have equal populations
  3. be geographically compact
  4. preserve state senate districts as much as possible

Algorithmic Constraints

We enforce a maximum population deviation of 0.5%.

Data Sources

Data for Rhode Island comes from the ALARM Project's Redistricting Data Files.

Pre-processing Notes

No manual pre-processing decisions were necessary.

Simulation Notes

We sample 5,000 districting plans for Rhode Island across four independent runs of the SMC algorithm. We assign state senate districts to act like counties so that the simulations minimize the number of state senate district splits.

Validation

validation_20230209_0017

SMC: 5,000 sampled plans of 2 districts on 244 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.31 to 0.69

R-hat values for summary statistics:
   pop_overlap      total_vap       plan_dev      comp_edge    comp_polsby 
      1.001005       1.002249       1.000207       1.000637       1.000967 
     pop_white      pop_black       pop_hisp       pop_aian      pop_asian 
      1.001288       1.000822       1.001146       1.000412       1.001699 
      pop_nhpi      pop_other        pop_two      vap_white      vap_black 
      1.000754       1.000956       1.000862       1.001325       1.001029 
      vap_hisp       vap_aian      vap_asian       vap_nhpi      vap_other 
      1.001301       1.000432       1.001856       1.000469       1.000946 
       vap_two pre_16_dem_cli pre_16_rep_tru pre_20_dem_bid pre_20_rep_tru 
      1.000653       1.000819       1.000625       1.000466       1.000649 
uss_18_dem_whi uss_18_rep_fla uss_20_dem_ree uss_20_rep_wat gov_18_dem_rai 
      1.000424       1.000471       1.001127       1.000737       1.000366 
gov_18_rep_fun atg_18_dem_ner sos_18_dem_gor sos_18_rep_cor         adv_16 
      1.000156       1.002423       1.000715       1.000524       1.000819 
        adv_18         adv_20         arv_16         arv_18         arv_20 
      1.000577       1.000564       1.000625       1.000366       1.000721 
 county_splits    muni_splits            ndv            nrv        ndshare 
      1.000040       1.000260       1.000495       1.000494       1.000400 
         e_dvs          e_dem           egap 
      1.000527       1.000143       1.001440 

Sampling diagnostics for SMC run 1 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,223 (97.9%)      4.7%        0.29   800 (101%)      4 
Resample    1,148 (91.8%)       NA%        0.29   772 ( 98%)     NA 

Sampling diagnostics for SMC run 2 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,223 (97.8%)      4.7%         0.3   802 (101%)      4 
Resample    1,146 (91.7%)       NA%         0.3   761 ( 96%)     NA 

Sampling diagnostics for SMC run 3 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,224 (97.9%)      4.7%        0.29   793 (100%)      4 
Resample    1,150 (92.0%)       NA%        0.29   784 ( 99%)     NA 

Sampling diagnostics for SMC run 4 of 4 (1,250 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,224 (97.9%)      4.8%        0.29   793 (100%)      4 
Resample    1,150 (92.0%)       NA%        0.29   767 ( 97%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%),
large std. devs. of the log weights (more than 3 or so), and low numbers of unique
plans. R-hat values for summary statistics should be between 1 and 1.05.

Checklist

@christopherkenny

mzhao80 commented 1 year ago

Validation plot for minimizing state senate district splits. Rplot

christopherkenny commented 1 year ago

Can we switch the SSD to be a soft constraint so that we can use county in the hard constraint? It looks like we're overshooting the county and municipality splits, while undershooting the senate target.

mzhao80 commented 1 year ago

Can we switch the SSD to be a soft constraint so that we can use county in the hard constraint? It looks like we're overshooting the county and municipality splits, while undershooting the senate target. @christopherkenny

Tried adding SSD as a separate constraint. Does not seem like it is binding, as the sim results come out identically irrespective of constraint strength. Does much better on counties but seems to do worse on municipalities and (as expected) worse on SSDs.

validation_20230211_1248

ssd splits

christopherkenny commented 1 year ago

Hmm that's tricky then. The split constraint might still be a bit low at 0.5. What ranges did you try?

mzhao80 commented 1 year ago

Hmm that's tricky then. The split constraint might still be a bit low at 0.5. What ranges did you try?

I tried a litany of strengths from on the scale of 0, 0.1-1, 10, and 100. No difference between any of these, which led me to wonder if I was applying the constraint correctly.

christopherkenny commented 1 year ago

Okay, maybe we should think about this a little more precisely. The issue that we have to avoid is:

To the extent practicable, the commission should endeavor to avoid the division of state representative districts in the formation of state senate districts and the division of state senate districts in the formation of United States congressional districts in any manner which would result in the creation of voting districts composed of fewer than one hundred (100) potential voters.

We might be able to use the current setup with a rejection criteria to balance these. Can we get a third opinion on this, @CoryMcCartan?

CoryMcCartan commented 1 year ago

what's the 100 voters part mean?

And did we discuss this elsewhere at all re: endogeneity? If the SSDs are biased somehow....

mzhao80 commented 1 year ago
SMC: 5,000 sampled plans of 2 districts on 244 units
`adapt_k_thresh`=0.985 • `seq_alpha`=0.5
`est_label_mult`=1 • `pop_temper`=0

Plan diversity 80% range: 0.21 to 0.67

R-hat values for summary statistics:
min_ssd_overlap     pop_overlap       total_vap        plan_dev       comp_edge 
      1.0003645       1.0010581       1.0016580       1.0005363       1.0004898 
    comp_polsby       pop_white       pop_black        pop_hisp        pop_aian 
      1.0006033       1.0003109       1.0009355       1.0004416       1.0027685 
      pop_asian        pop_nhpi       pop_other         pop_two       vap_white 
      0.9998443       1.0005188       1.0005560       1.0003874       1.0001865 
      vap_black        vap_hisp        vap_aian       vap_asian        vap_nhpi 
      1.0013162       1.0005929       1.0030503       0.9998665       1.0013882 
      vap_other         vap_two  pre_16_dem_cli  pre_16_rep_tru  pre_20_dem_bid 
      1.0007423       1.0008080       1.0015029       1.0014446       1.0015764 
 pre_20_rep_tru  uss_18_dem_whi  uss_18_rep_fla  uss_20_dem_ree  uss_20_rep_wat 
      1.0013305       1.0012120       1.0011985       1.0020001       1.0018792 
 gov_18_dem_rai  gov_18_rep_fun  atg_18_dem_ner  sos_18_dem_gor  sos_18_rep_cor 
      1.0005415       1.0000188       1.0017515       1.0014141       1.0009320 
         adv_16          adv_18          adv_20          arv_16          arv_18 
      1.0015029       1.0012967       1.0016625       1.0014446       1.0005596 
         arv_20     muni_splits             ndv             nrv         ndshare 
      1.0014602       1.0010220       1.0013903       1.0012115       1.0009471 
          e_dvs           e_dem            egap 
      1.0008868       1.0005558       1.0011900 

Sampling diagnostics for SMC run 1 of 4 (1,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,484 (99.0%)      4.0%        0.21   965 (102%)      4 
Resample    1,440 (96.0%)       NA%        0.21   925 ( 98%)     NA 

Sampling diagnostics for SMC run 2 of 4 (1,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,484 (98.9%)      4.0%        0.21   953 (101%)      4 
Resample    1,437 (95.8%)       NA%        0.21   945 (100%)     NA 

Sampling diagnostics for SMC run 3 of 4 (1,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,484 (98.9%)      3.9%        0.21   950 (100%)      4 
Resample    1,439 (95.9%)       NA%        0.21   940 ( 99%)     NA 

Sampling diagnostics for SMC run 4 of 4 (1,500 samples)
         Eff. samples (%) Acc. rate Log wgt. sd  Max. unique Est. k 
Split 1     1,484 (98.9%)      4.2%        0.21   937 ( 99%)      4 
Resample    1,439 (95.9%)       NA%        0.21   957 (101%)     NA 

•  Watch out for low effective samples, very low acceptance rates (less than 1%),
large std. devs. of the log weights (more than 3 or so), and low numbers of unique
plans. R-hat values for summary statistics should be between 1 and 1.05.

a1f17816-2af1-4c73-92b1-efd133b33904

14b75c70-a3bd-4984-8183-176ec1710396

So it turns out that all of the simulated plans comply with the 1000-voter limit anyways (the smallest non-zero unique SSD-CD voting district has a population of 1665.) I still wrote in code to start with a 6000-district sample and filter to 5000-districts based on the rejection criteria just to formalize what we had discussed.

CoryMcCartan commented 1 year ago

Looks solid to me! @christopherkenny ?