FreeAndFair / ColoradoRLA

Software to facilitate risk-limiting audits at the state level, developed for the state of Colorado.
Other
21 stars 8 forks source link

Robust estimation of ballots for next round #695

Open nealmcb opened 7 years ago

nealmcb commented 7 years ago

There are some corner cases of unusual inputs, encountered during testing, in which the calculations from the nminFromRates() function in https://www.stat.berkeley.edu/~stark/Vote/auditTools.htm are invalid. An interim robust-but-inaccurate algorithm which leverages the robust and accurate results from the nmin() function is in use now for estimating ballots-to-audit for future rounds.

Replace this interim algorithm with a better one, which takes into account the currently observed discrepancy rate.

See also #422 , #482

sfsinger19103 commented 7 years ago

I think I have a formula, but it needs another set of eyes. I'm starting with something I derived from Equation 10 in Stark's "Super Simple Simultaneous..." paper, namely, that (looking only at 1- and 2-vote overstatements, ignoring understatements) the stopping size N after looking at K ballotsis:

the minimum N such that N >= A + B x + C y

where x is the number of 2-vote overstatements found in the K ballots, y is the number of one-vote overstatements, and A, B and C are positive constants depending only on gamma, the risk limit and the diluted margin.

Suppose the above analysis is valid (which needs a second pair of eyes, please!). If we assume further that overstatements appear at the same rate in the rest of the sample, then after looking at M ballots (M>K) we will have a stopping size of:

the minimum N such that N >= A + BMx/K + CMy/K

We expect the audit to finish when we've counted M ballots and M is greater than the stopping size calculated after M ballots. In other words:

M ( 1 - Bx/K - Cy/K ) >= A.

If the quantity in parentheses is strictly positive, we can divide both sides to get M, which is the stopping size (assuming errors continue at the observed rate). If the quantity in parentheses is zero or strictly negative, then we might as well go to a hand count.

sfsinger19103 commented 7 years ago

Better formula, including understatements.

From https://www.stat.berkeley.edu/~stark/Java/Html/auditTools.htm

stopping sample size = -2g(log(a) + o1log(1-1/(2g)) + o2log(1 - 1/g) + u1log(1+1/(2g)) + u2log(1+1/g)) / m)

There's an extra close paren in this formula, and I think it's at the end. I think the formula is actually

stopping sample size = -2g(log(a) + o1log(1-1/(2g)) + o2log(1 - 1/g) + u1log(1+1/(2g)) + u2log(1+1/g)) / m) =(-2 g / m) * [ log(a) + o1log(1-1/(2g)) + o2log(1 - 1/g) + u1log(1+1/(2g)) + u2log(1+1/g) ]

m = diluted margin, g = "gamma" = error inflation factor from "Super-Simple Simultaneous…"

log is natural log.

Set: A := (-2g/m ) log(a) B := (-2g/m ) ( log(1-1/(2g)) ) C := (-2g/m ) ( log(1 - 1/g) ) D := (-2g/m ) ( log(1+1/(2g)) ) E := (-2g/m ) * ( log(1+1/g) )

In this language, the stopping size after K ballots is the minimum N such that

N >= A + B o1 + C o2 +  D u1 +  Eu2

 If we assume further that overstatements appear at the same rate in the rest of the sample, then after looking at M ballots (M>K) we will have a stopping size of:

the minimum N such that N > A + (B M o1 )/K + (C M o2)/K + (D M u1)/K + (E M u2)/K

We expect the audit to finish when we've counted M ballots and M is greater than the stopping size calculated after M ballots. In other words:

Edited (there were extra Ms, now removed): M ( 1 - (B o1 )/K - (C o2)/K - (D u1)/K - (E u2)/K) >= A.

If the quantity in parentheses is strictly positive, we can divide both sides to get M, which is the stopping size (assuming errors continue at the observed rate). If the quantity in parentheses is zero or strictly negative, then we might as well go to a hand count.

sfsinger19103 commented 7 years ago

Per 9/20 phone conversation with Phil Stark, the formula above was meant for a demo website, not necessarily as a reference. It works when the error rate is low (compared to what?). Maybe that's OK in the current case, since presumably large error rates will lead fairly soon to a full hand count.

And, as Phil pointed out, the end goal of the audit depends only on the risk level calculation, not the stopping size estimate. The stopping size estimate is important for determining round size, so the cost of a "bad" stopping size estimate will be paid either in extra rounds or in too many ballots in a round. But the integrity of the audit will not be affected.

dmzimmerman commented 7 years ago

Yes, of course. But it’d still be good to get as reasonable an estimate as we can... I think your other formula was interesting in that respect. It’s not at all clear how well what we’re using behaves in that respect, and I’m not even really sure how to evaluate that. (some sort of simulation in which we repeatedly run audits with some randomness in the data with various error rates and measure how accurate the round 2 stopping sizes are, perhaps?)

pbstark commented 7 years ago

"High" means "so high that it implies that the outcome is wrong, or that the audit will never terminate." If the expected error rates is so high error rate that the audit will not terminate (e.g., if the error rate implies that the outcome is wrong), it gives nonsense--and there's no point to auditing, you should just do a full hand count. The fix is simple: if the denominator is not negative, throw an exception (or set the expected sample size equal to the population size). A detailed derivation of the expected $n$ calculation is here: https://github.com/pbstark/S157F17/blob/master/audit.ipynb

The result is $$ n \ge \frac{\log \alpha}{\log \left ( 1 - \frac{\mu}{2\gamma} \right ) -\omega_1 \log \left ( 1 - \frac{1}{2\gamma}\right ) -\omega_2 \log \left ( 1 - \frac{1}{\gamma}\right ) -\upsilon_1 \log \left ( 1 + \frac{1}{2\gamma}\right ) -\upsilon_2 \log \left ( 1 + \frac{1}{\gamma}\right )}, $$ where $(\omega_1, \omega_2, \upsilon_1, \upsilon_2)$ are the expected rates of 1 and 2 vote overstatements and 1 and 2 vote understatements, respectively.

nealmcb commented 7 years ago

Thank you Philip! The formatted version of that LaTeX math is:

sfsinger19103 commented 7 years ago

Reconciliation of Philip's formula and my formula:

Philip's formula and mine agree except for the first term in the denominator on the RHS of Philip's formula. Where his formula has log(1-mu/2gamma) mine has must -mu/2gamma. For small values of mu/2gamma these two quantities are close to equal. Since for any mu and any gamma

-mu/2gamma >= log(1-mu/2gamma),

it follows that Philip's formula will yield a larger stopping size n than my formula will.

nealmcb commented 7 years ago

Thanks, Philip! I've incorporated your Python code for your new equations at https://github.com/nealmcb/audit_cvrs/blob/master/audit_cvrs/rlacalc.py I also included unit tests, code to check validity of the arguments etc..

I haven't compared the results to those from Stephanie's math yet.

I note for others that as we've discussed offline, your new function KM_Expected_sample_size() which implements your math from above addresses many of our requirements, but it is not quite a drop-in replacement for nminFromRates(). By itself it does not support the roundUp options. In rlacalc.py I also included KM_Expected_sample_size_rounded() which combines your function with a more robust version of the loop in nminFromRates() which you use to implement the rounding calculations at https://www.stat.berkeley.edu/~stark/Vote/auditTools.htm.

I think these functions are a good basis for the Java code we need for this issue.

nealmcb commented 7 years ago

For the record, the current algorithm in ColoradoRLA-1.0.0 is in CountyContestComparisonAudit.java and works like this:

    def recalculateSamplesToAudit(num_audited, a, gamma, m, o1, o2, u1, u2):
        nm = nmin(a, gamma, m, o1, o2, u1, u2)
        return ceil(nm * (1 + ((o1 + o2) * 1.0 / num_audited)))

where the number of ballots audited so far (thru a round we'll call n) is num_audited. The tool thus sets the size of round n+1 so that it will cover all discrepancies seen so far, in that sample, and multiplies the result by a small fraction based on just the overstatements (treated alike), without further considering the margin. This is used to calculate estimated_samples_to_audit.