BrightSpots / rcv

Ranked Choice Voting Universal Tabulator
Mozilla Public License 2.0
71 stars 19 forks source link

random seed for tiebreaking #170

Open tarheel opened 5 years ago

tarheel commented 5 years ago

If tiebreakMode = random, we should require the config to specify a random seed also. This will ensure consistent results across runs.

HEdingfield commented 5 years ago

Also recommend we add a button in the GUI to randomly generate a seed.

CalebKleppner commented 5 years ago

I am not sure we need to sweat this detail. Random can be random, and the way you test whether it's random is to run the same election with a tie many times to see if the tie gets broken an approx. same number of times for each choice (could be more than 2).

nealmcb commented 5 years ago

I'm not positive this is relevant, but for auditability, it is critical that software can use publicly-verifiable, truly random seeds, e.g. via dice rolls in a public setting, along with any pseudo-random number generation algorithm used to stretch a seed to deal with multiple ties.. Documenting those seeds and algorithms in both input config files and output result files is necessary for interoperable replication and verification.

gngilbert commented 5 years ago

Not sure what our resolution should be here but think it needs to be discusses and resolved. The issues of documentation and reproducible results are critical.

tarheel commented 5 years ago

Need to use supplied seed for both Math.random and Collections.shuffle.

HEdingfield commented 5 years ago

Agree that a seed is essential if we're going to allow random as an option. If we end up not having time to complete this before certification, we should probably disable this option entirely.

HEdingfield commented 5 years ago

This is essentially closed with #313, but it'd be nice to see tests for all three random-related tiebreakers, not just RANDOM as was done in tiebreak_seed_test.

tarheel commented 5 years ago

Yeah, I'll add the other two.

nealmcb commented 5 years ago

I'm always humbled by the complexities of properly addressing randomness, and especially verifiable randomness, in computer software. Thanks for your work here!

I'm sorry I didn't notice the work in #313 or dive deeper into this before, beyond referencing Rivest's "sampler" PRNG at https://github.com/nealmcb/rivest-sampler-tests/tree/python3-port in https://github.com/BrightSpots/rcv/issues/43#issuecomment-480910929, because I think finding a solution that is suitable for a reference implementation and auditing would benefit from more discussion.

I haven't looked at the details or tried it out, but my guess is that the current approach might work, given some more documentation.

As I noted above, to achieve results that others can check and reproduce (e.g. with different software), we need not just the ability to specify an initial random seed, but also a description of the algorithm used to produce random numbers via that seed, and a description of how and when the algorithm calls for those numbers.

What algorithm is the code currently using, and is it secure enough for the uses here? It looks like it uses Java's linear congruential formula as documented at Random (Java Platform SE 6) though a more permanent or up-to-date reference might be appropriate.

Best practices for auditing currently involve a public ceremony for producing a random seed via 20 throws of a 10-sided die, though the algorithm I referenced by Rivest can arbitrary length input using any Unicode characters. Being able to use the same form of input would be ideal, so that we could accommodate related current and future practices, but probably not essential.

It looks like the current Java approach limits seeds to 48 bits (via a seed between 0 and 281,474,976,710,656), which is probably sufficient for the less-stringent requirements of randomness in IRV/STV tallying, though I haven't looked at that very deeply. E.g. what would we need to ensure that all possible permutations of random choices needed in any STV election with up to some reasonable number of candidates and seats would be equally likely? @ron-rivest probably has insights here.

If that's true, then adding documentation to this effect, and pointing to related compatibility code like Python's java-random · PyPI, might work. In addition, it's presumably necessary to specify the exact manner in which random numbers are used during the tally, and the exact order in which they are requested for that use. And I'm guessing that care has to be taken to ensure that no other calls to get random numbers can happen in the whole process during the tally, since otherwise they would throw off subsequent choices. E.g. if some library involved in the GUI does so, the results might vary randomly ;) .

Does this make sense? Can this be reopened to deal with the documentation and any other needed work?

HEdingfield commented 5 years ago

Thanks for the comments, Neal. Hopefully we'll have time before our Sunday night deadline to address this.

tarheel commented 5 years ago

Not sure if we should worry about iterating further on this code for tomorrow's deadline -- I'm guessing that the jurisdictions using our software based on this initial certification will want to use the interactive tie-breaking option and handle the random choices outside of the software -- but @nealmcb's suggestions sound like a great next step toward being able to tell people that they can trust our software to handle random tie-breaking.

And if it's just a matter of documentation, then it's OK if we don't sort this out in the next 24 hours anyway.

gngilbert commented 5 years ago

Agreed. For most elections offices a coin flip is as random as they care about. Definitely want to address it later, however, if we don't get it now.

On Sat, Jul 6, 2019, 6:19 PM Louis Eisenberg notifications@github.com wrote:

Not sure if we should worry about iterating further on this code for tomorrow's deadline -- I'm guessing that the jurisdictions using our software based on this initial certification will want to use the interactive tie-breaking option and handle the random choices outside of the software -- but @nealmcb https://github.com/nealmcb's suggestions sound like a great next step toward being able to tell people that they can trust our software to handle random tie-breaking.

And if it's just a matter of documentation, then it's OK if we don't sort this out in the next 24 hours anyway.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BrightSpots/rcv/issues/170?email_source=notifications&email_token=AJODE7QFXYTTUBPD6CJANVDP6ELATA5CNFSM4FVUYKG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLBE5Q#issuecomment-508957302, or mute the thread https://github.com/notifications/unsubscribe-auth/AJODE7W5C7Z3M6FIU55VIELP6ELATANCNFSM4FVUYKGQ .

tarheel commented 5 years ago

I'm removing the certification-deliverable tags on this, but leaving it open so we can follow up on Neal's suggestions when we have time.

HEdingfield commented 5 years ago

@nealmcb please check out the documentation that @moldover added in PR #356 if you get a chance and let us know what you think.

nealmcb commented 5 years ago

Thanks for that! I left a bit of feedback on #356, even though it has been merged.

Are there any examples we can point to for people to test for interoperable implementations? Ideally we'd have a worked example with at least 3 ties among let's say 3 candidates each.

Why is the random seed limited to 31 bits [0..2147483647]? I'd recommend allowing at least the 48 bits that Random leverages. Probably not any practical risks, but the permutations for more than about 11 candidates would be biased, and I've been surprised by related issues before.

moldover commented 5 years ago

Not sure what you mean by interoperable implementations. What would be an example of that?

The randomSeed field is a text field which gets converted to a non-negative Integer by Java, that's why the funky upper limit.

tarheel commented 5 years ago

The randomSeed field is a text field which gets converted to a non-negative Integer by Java, that's why the funky upper limit.

The Random class supports using a long instead of a regular int, though, so it's not hard for us to accommodate Neal's request.

HEdingfield commented 5 years ago

We can allow negative numbers now too for the extra bit, since I got rid of the non-negative integer listener in the GUI that prevented that before. Very surprising that 31 bits has bias with only 11 candidates! That's crazy.

HEdingfield commented 5 years ago

Hi @nealmcb, after thinking about this a bit more, I also wonder if allowing a full long potentially introduces bias in the outer ranges, since it's 64-bit, but must be shifted to 48-bit. Per the docs:

(seed ^ 0x5DEECE66DL) & ((1L << 48) - 1)

For example, if a user enters a number in the 48+ bit range, is this shifting process truly resulting in random truncation? In other words: should we actually restrict the randomSeed to be from -140,737,488,355,328 and 140,737,488,355,327 or is it fine to allow the whole range of a long?

I'm also curious to hear your responses to the follow-up comments in #356, particularly what @moldover mentioned about specifying version 12, and your desire to contribute to the code a bit.

nealmcb commented 5 years ago

The long is not shifted, it is masked to obtain only the bottom 48 bits. So so long as the seed can contain at least 48 bits of usable entropy in its least-significant bits, I think users have the ability to at least get as much entropy as these Java randoms allow. I see no need to constrain negative values. I added some comments to #356 and #43. I'm not a Java programmer, so won't contribute code, but can hopefully find some time to help with tests and documentation, though I haven't been finding as much time as I've wanted to!

HEdingfield commented 5 years ago

370 implemented 48-bit random seed. I think we're ready to close this issue, but I'm not the primary driver here, so will let someone else make that call.

tarheel commented 5 years ago

After reviewing this thread again, I think the one follow-up before closing is that we should just allow the random seed to be any legal long value instead of constraining it to be from -2^47 to 2^47 - 1 (since, as Neal pointed out, allowing those additional values doesn't introduce bias).

HEdingfield commented 5 years ago

What would the benefit be of doing that, considering it's masked down to 48 bits?

tarheel commented 5 years ago

I figured it might be less confusing to the user since they're more likely to have an input that's an arbitrary long value vs. an arbitrary 48-bit value. But I'm fine with whatever here.

ron-rivest commented 5 years ago

Hi all --

I'm sure that I'm unaware of most of the relevant context here for tie-breaking, but here are my thoughts/recommendations:

-- Use a cryptographic PRNG such as one based on SHA-256 as your PRNG; in this way you are not tied to a specific language and a language-specific PRNG. Furthermore, you can allow arbitrary-length "seeds". For an RCV method, my approach would be: -- accept an input random seed from the user (an arbitrary string) -- create a pseudorandom permutation of the candidate list, based on the seed. For example (in Julia-like code; note that means string concatenation): rand(seed, i) = integer(hexdigest(sha256(string(seed)string(i)), 16) # convert hex to int function shuffle(candidates, seed) for i in 2:length(candidates) j = mod1(rand(seed, i)) # returns random in range 1 to i, inclusive swap candidates[i] <==> candidates[j] end return(candidates) end candidates = shuffle(candidates, seed) -- Then, whenever you have a choice as to which candidate to eliminate (in a tie), you eliminate the one earlier in the (shuffled) candidates list.

Hope this is helpful...

Cheers, Ron

On Mon, Sep 2, 2019 at 8:56 PM Louis Eisenberg notifications@github.com wrote:

I figured it might be less confusing to the user since they're more likely to have an input that's an arbitrary long value vs. an arbitrary 48-bit value. But I'm fine with whatever here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BrightSpots/rcv/issues/170?email_source=notifications&email_token=ABQ765LM5F46H73DLASYBK3QHWY4TA5CNFSM4FVUYKG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WXBAY#issuecomment-527265923, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQ765JX6QEWCNWOBGXZSV3QHWY4TANCNFSM4FVUYKGQ .

-- The climate crisis isn't about who's right. It's about who's helping.

nealmcb commented 5 years ago

I like Ron's approach, as I've suggested before.

If you chose not to go with that much more flexible approach, then, while I can see reasons for allowing any long as input, I actually think you'll avoid misleading people if you limit it to 48 bits. You don't want someone to think that entropy in the upper bits, which are stripped off in the current approach, will actually be used. It would be sort of like letting people enter a long password but only paying attention the the first 8 - a mistake from the 46-year-old history of Unix passwords that STILL persists in places and leads to lots of pain.