alarm-redist / redist

Simulation methods for legislative redistricting.
https://alarm-redist.github.io/redist/
GNU General Public License v2.0
67 stars 23 forks source link

Trouble using vra contraints #109

Closed mjohnsonpa closed 2 years ago

mjohnsonpa commented 3 years ago

I'm assuming there's something I'm doing wrong here, rather than this being a bug, but I haven't been able to get the vra constraints to work, and I also haven't been able to find an example of running through a project using the constraints.

I'm using TX precinct data, and I can run the redist_smc just fine. When I try to add in the constraints, however, it just doesn't seem to do anything. If I try to use the redist.constraint.helper(), the output is just a list of 1, only including the value entered for strength_vra. I've also tried building the list of lists with the required parameters (tgt_min, group_pop, total_pop, etc.), but regardless of how I build this constraint list, redist_smc just seems to ignore it. I can see within the constraints attribute of the redist_plans object that's created by redist_smc(), that there is a $vra portion that is a list of 5 (strength, tgt_vra_min, tgt_vra_other, pow_vra, and min_pop), which I notice is different from the lists that go into the redist.constraint.helper() tool. regardless of how I have tried to add these variables in, constraints/vra/strength is always 0 in the redist_plans output.

Is there a working example of this workflow somewhere that I might be able to look at to better understand how these parameters are to be entered? Thank you!

christopherkenny commented 3 years ago

Hi Mike,

Yeah, the redist.constraint.helper() function is not super useful currently. We started working on it but never got beyond the idea. (Updates to that will be part of a more unified approach to constraints across algorithms that we are aiming for in the next major larger release of the package.) It was not really working at all, so I just pushed a small bugfix to master that should make it useful for the future.

Here's a working example that should illustrate how constraints can be used by setting up a constraint list.

# we load geomander for some Delaware data
library(geomander) # >= 1.1.3
library(redist)
#> 
#> Attaching package: 'redist'
#> The following object is masked from 'package:stats':
#> 
#>     filter
# set up data ----
# load the ALARM Project data for Delaware
de <- get_alarm('DE')
#> Rows: 412 Columns: 44
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (4): GEOID20, state, county, vtd
#> dbl (40): pop, pop_hisp, pop_white, pop_black, pop_aian, pop_asian, pop_nhpi...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Make a redist map with 21 districts (assuming state senate for this)
de_map <- redist_map(de, pop_tol = 0.05, ndists = 21)
#> Projecting to CRS 3857
# constraints ----
# calculate the non-MMD expected Black percent ----
tgt_vra_not_mmd <- redist.find.target(tgt_min = 0.55, # % black for a MMD 
                                      group_pop = de$pop_black, # Black population
                                      total_pop = de$pop,  # total population
                                      ndists = 21, # 21 districts
                                      nmmd = 5 # 5 majority minority
                                      )
# make a list of constraints ----
cons <- list(
    vra = list(
        strength = 100,
        min_pop = de$pop_black,
        tgt_vra_min = 0.55,
        tgt_vra_other = 0.1104732,
        pow_vra = 1.5
    )
)

# run SMC ----
plans <- redist_smc(de_map, nsims = 50, counties = county, constraints = cons,
                    silent = TRUE) # make it quiet for a reprex.

# check plans output for expected values from cons
str(attributes(plans)$constraints$vra)
#> List of 5
#>  $ strength     : num 100
#>  $ tgt_vra_min  : num 0.55
#>  $ tgt_vra_other: num 0.11
#>  $ pow_vra      : num 1.5
#>  $ min_pop      : num [1:412] 1080 2933 911 76 2570 ...
# if we want multiple constraints ----

# make a list of constraints ----
cons2 <- list(
    vra = list(
        strength = 100,
        min_pop = de$pop_black,
        tgt_vra_min = 0.55,
        tgt_vra_other = 0.1104732,
        pow_vra = 1.5
    ), # add more as lists separated by commas
    multisplits = list(
        strength = 5
    )
)

# run SMC ----
plans <- redist_smc(de_map, nsims = 50, counties = county, constraints = cons2,
                    silent = TRUE) # make it quiet for a reprex.

# check plans output for expected values from cons2
str(attributes(plans)$constraints$vra)
#> List of 5
#>  $ strength     : num 100
#>  $ tgt_vra_min  : num 0.55
#>  $ tgt_vra_other: num 0.11
#>  $ pow_vra      : num 1.5
#>  $ min_pop      : num [1:412] 1080 2933 911 76 2570 ...
str(attributes(plans)$constraints$multisplits)
#> List of 1
#>  $ strength: num 5

Created on 2021-11-02 by the reprex package (v2.0.0)

CoryMcCartan commented 3 years ago

At this point I think we also recommend the hinge constraint for generic VRA analyses, as it tends to work better

mjohnsonpa commented 3 years ago

That's very helpful, and that's working for getting the constraints attribute to accept the values from the list... it leads to two more questions about the vra constraint, however:

1 - is there somewhere that one can set the target number of VRA districts? It looks like the nmmd parameter in the redist.find.target is just for defining the target_vra_other, and doesn't actually feed into the redist_smc anywhere. The redist.constraint.helper() function teased a nmmd variable that looked like it would set a target for the number of VRA districts

2 - for the total population for parity between districts, I'm using the overall total population, but for vra considerations, I believe the goal is to use a percent of the voting age population. The redist.constraint.helper() looked like it would have had parameters for total and minority populations, allowing the consideration of total voting age pop vs. minority voting age pop for the vrra constraint, while still using the total population for the population variance between districts. Here it seems like when I put in the min_pop variable, it's probably comparing that to the already-established total population, which would lock me into using total minority population rather than the voting age? Is that accurate?

So far while putting in the parameters as you have them specified, I am getting the redist_plans object to take on the proper list values, but my output isn't giving a larger number of vra districts.

A related question, I'm assuming you can only give one set of vra lists to the model, so I wouldn't currently be able to set a target of maintaining one majority Black district and 9 majority Hispanic districts at the same time?

Thanks so much, really appreciate your responses.

christopherkenny commented 3 years ago

So, we don't currently have a way to directly target a number of MMDs, unfortunately.

(1) The benefit of using the vra constraint is that it has two peaks to the penalty function, so you can figure out what the minority target percents would be for the MMDs and non-MMDs, and thus target sampling districts that get there. You will generally need to tune this a bit. (I've found moving the lower value down an additional 10-15% of the calculated target can sometimes help.)

(2) The constraint is comparing to the total population. That was an old choice that should be updated when we refactor the constraints backend to make it more maintainable. (Currently tweaking the implementation of constraints to add new ones can take a few hours to get right, so it is a priority to fix that design to allow for more flexibility.)

On your related question, this is definitely a drawback of the current implementation. For working with TX and CA, it's a bit of an issue. Ideally, I think we would change it to take a denominator (e.g. vap) and a matrix of vap_* for different races/ethnicities *.

Thank you for the feedback and questions on this, by the way. It's helpful to get an idea of what people that aren't actively maintaining and developing features for the package are thinking.

CoryMcCartan commented 3 years ago

Just an update — we are just finishing a major reworking of our constraints interface.

Now you can specify multiple VRA-type constraints (e.g. for Black and Hispanic), and each can have its own numerator and denominator.

This is live in the dev branch, with documentation at ?redist_constr and ?constraints, in case you want to check it out.

christopherkenny commented 3 years ago

cc @jpredist

mjohnsonpa commented 3 years ago

@christopherkenny & @CoryMcCartan

Thanks for your replies, super helpful. Yes, I had been playing with adjusting the non-MMD percentages downward, seeing if that could get my MMD count up, but it's been tough. I can sometimes get up to seven, but we really should be at around 9 - The new adopted maps in TX are not only really skewed from a partisan perspective, but also reduced the number of Hispanic-majority districts and eliminated the Black-majority district, and while I'm interested in putting some analysis out there that can comment on the degree of the Gerrymander, I can't do that if the VRA-district counts on my modeled maps are even more problematic than what the legislature passed.

The new update to the constraints interface sounds great. Looking at the documentation, it seems like I can just do this for multiple vra-type constraints:

con <- list( pow = list( hisp = list( strength = 100, group_pop = d$hispvap, total_pop = d$vap, tgt_grp = 0.55, tgt_other = 0.20, pow = 1.5), blk = list( strength = 100, group_pop = d$blackvap, total_pop = d$vap, tgt_grp = 0.55, tgt_other = 0.15, pow = 1.5) ) )

And then within the constraints attribute on my redist_plans object, I see that the values have been passed along...

image

Should that mean that both of these constraints are now being considered when the package draws districts, and it's just a matter of trying to tweak the strength, power, and tgt_other to try and get the vra district counts up?

Thank you!

CoryMcCartan commented 3 years ago

If you're using the new dev version, then we recommend making constraints with the redist_constr() and add_constr_grp_pow()/_hinge() functions, which will make sure everything is correctly formatted. Then it is just a matter of adjusting strengths etc.

In general TX is a real VRA challenge for samplers. We don't have foolproof answers for increasing the number of MMDs. Sometimes it can be useful to pre-merge adjacent precincts which are high minority VAP %age into super-precincts, which prevents them from being split & may nudge the algorithm in the right direction. That would be accomplished by creating an indicator variable for high VAP %age and then running merge_by(vap_indicator, county) on it