alarm-redist / redist

Simulation methods for legislative redistricting.
https://alarm-redist.github.io/redist/
GNU General Public License v2.0
67 stars 23 forks source link

Documenting the counties argument and splits constraints #99

Closed kuriwaki closed 2 years ago

kuriwaki commented 3 years ago

Some of the documentation for the counties argument is a bit hard to navigate.

    • The documentation says specifying counties will only generate maps which split up to ndists-1 counties. Is there a parameter to change that ndists-1 threshold?
    • How is this different from the splits constraint? That constraint is not mentioned in redist_smc but it is in redist_mergesplit. What happens if I set splits = list(strength = 100) in redist_smc also with counties?
    • If splits is simply another name for the county constraint, I'd suggest using that language as an argument? Incidentally the counties naming is too US-context-specific -- we use it in Japan and city boundaries as well. Could there be an argument (e.g. splits) that simply duplicates counties so users can use either?

cc @tylersimko

christopherkenny commented 3 years ago
  1. The ndists-1 limit is a hard, algorithmic constraint and can't be changed on the algorithmic end. You can do some sort of mechanical pre-processing, like merging counties or making a different input to counties. If you do things like merging small counties that you really don't want to be split, this will lower the total splits in general, though not guaranteed.
  2. splits is not currently an option in redist_smc, as the arguments are not currently passed on to the Rcpp that actually runs SMC. If you set constraints(splits = list(strength = 100)), it won't break anything, but it also won't be accessed. splits is a soft Gibbs constraint, rather than a hard algorithmic constraint, so they are very different.
  3. I don't think we'd want to name it splits, as they do have different meanings. We could call it the more generalized term, like political_subdivision, but I'm not sure if that adds more value than it costs. Happy to hear other ideas for names, if this is something important.
  4. Will fix that, thanks!
kuriwaki commented 3 years ago

I see, thank you.

  1. If there are 4 districts and only 2 counties (city vs. non-city), then the constraint would be to "only generate maps which split up to 3 counties", which is never binding here (there are only 2 counties to split). And yet, we see differences in simulations. Is the counties argument doing something else?

  2. Got it. I think noting your answer here in the manual would be clarifying. Or throw a warning if a user sets splits in SMC. We were about to make this mistake.

CoryMcCartan commented 3 years ago

Re: 1, what's going on is that the spanning trees are being drawn first at the county level and then are joined together with a meta-spanning tree. The result of this is to lead to partitions which tend to follow county boundaries. It also guarantees the maximum number of county splits. But as you note, even with more districts than counties, the way the trees are drawn still makes it useful.

christopherkenny commented 3 years ago

Comment 2 is very linked to #96. The current constraint checkers are designed largely to avoid breaking errors in the Rcpp, not detect user errors. I think we can make #96 a 3.2 priority to ensure that people aren't doing bad things without knowing it (or just aren't doing things).

On 1, yeah, the idea of drawing the spanning trees in the counties (as Cory mentions) is really important. You get much, much more realistic plans this way. The plans drawn without counties are fairly far off from what states really enact, whereas we can sample pretty similar plans using counties, even when it isn't binding at the max, but is useful district-by-district.

kuriwaki commented 3 years ago

I think noting this mechanism in the docs could be helpful.

In the @param counties in redist_smc or the details, I'd specify something along the lines of "this will draw spanning trees within each of the counties specified. There is no strength parameter associated with this option. Even there are fewer counties than ndists - 1, the spanning trees will change the results of the simulations.".

The @param counties in redist_mergesplit can specify that the strength of that "soft" Gibbs constraint is specified by the "splits" constraint.

An error or warning for mismatching constraints would be great.