Open nealmcb opened 7 years ago
I think I have a formula, but it needs another set of eyes. I'm starting with something I derived from Equation 10 in Stark's "Super Simple Simultaneous..." paper, namely, that (looking only at 1- and 2-vote overstatements, ignoring understatements) the stopping size N after looking at K ballotsis:
the minimum N such that N >= A + B x + C y
where x is the number of 2-vote overstatements found in the K ballots, y is the number of one-vote overstatements, and A, B and C are positive constants depending only on gamma, the risk limit and the diluted margin.
Suppose the above analysis is valid (which needs a second pair of eyes, please!). If we assume further that overstatements appear at the same rate in the rest of the sample, then after looking at M ballots (M>K) we will have a stopping size of:
the minimum N such that N >= A + BMx/K + CMy/K
We expect the audit to finish when we've counted M ballots and M is greater than the stopping size calculated after M ballots. In other words:
M ( 1 - Bx/K - Cy/K ) >= A.
If the quantity in parentheses is strictly positive, we can divide both sides to get M, which is the stopping size (assuming errors continue at the observed rate). If the quantity in parentheses is zero or strictly negative, then we might as well go to a hand count.
Better formula, including understatements.
From https://www.stat.berkeley.edu/~stark/Java/Html/auditTools.htm
stopping sample size = -2g(log(a) + o1log(1-1/(2g)) + o2log(1 - 1/g) + u1log(1+1/(2g)) + u2log(1+1/g)) / m)
There's an extra close paren in this formula, and I think it's at the end. I think the formula is actually
stopping sample size = -2g(log(a) + o1log(1-1/(2g)) + o2log(1 - 1/g) + u1log(1+1/(2g)) + u2log(1+1/g)) / m) =(-2 g / m) * [ log(a) + o1log(1-1/(2g)) + o2log(1 - 1/g) + u1log(1+1/(2g)) + u2log(1+1/g) ]
m = diluted margin, g = "gamma" = error inflation factor from "Super-Simple Simultaneous…"
log is natural log.
Set: A := (-2g/m ) log(a) B := (-2g/m ) ( log(1-1/(2g)) ) C := (-2g/m ) ( log(1 - 1/g) ) D := (-2g/m ) ( log(1+1/(2g)) ) E := (-2g/m ) * ( log(1+1/g) )
In this language, the stopping size after K ballots is the minimum N such that
N >= A + B o1 + C o2 + D u1 + Eu2
If we assume further that overstatements appear at the same rate in the rest of the sample, then after looking at M ballots (M>K) we will have a stopping size of:
the minimum N such that N > A + (B M o1 )/K + (C M o2)/K + (D M u1)/K + (E M u2)/K
We expect the audit to finish when we've counted M ballots and M is greater than the stopping size calculated after M ballots. In other words:
Edited (there were extra Ms, now removed): M ( 1 - (B o1 )/K - (C o2)/K - (D u1)/K - (E u2)/K) >= A.
If the quantity in parentheses is strictly positive, we can divide both sides to get M, which is the stopping size (assuming errors continue at the observed rate). If the quantity in parentheses is zero or strictly negative, then we might as well go to a hand count.
Per 9/20 phone conversation with Phil Stark, the formula above was meant for a demo website, not necessarily as a reference. It works when the error rate is low (compared to what?). Maybe that's OK in the current case, since presumably large error rates will lead fairly soon to a full hand count.
And, as Phil pointed out, the end goal of the audit depends only on the risk level calculation, not the stopping size estimate. The stopping size estimate is important for determining round size, so the cost of a "bad" stopping size estimate will be paid either in extra rounds or in too many ballots in a round. But the integrity of the audit will not be affected.
Yes, of course. But it’d still be good to get as reasonable an estimate as we can... I think your other formula was interesting in that respect. It’s not at all clear how well what we’re using behaves in that respect, and I’m not even really sure how to evaluate that. (some sort of simulation in which we repeatedly run audits with some randomness in the data with various error rates and measure how accurate the round 2 stopping sizes are, perhaps?)
"High" means "so high that it implies that the outcome is wrong, or that the audit will never terminate." If the expected error rates is so high error rate that the audit will not terminate (e.g., if the error rate implies that the outcome is wrong), it gives nonsense--and there's no point to auditing, you should just do a full hand count. The fix is simple: if the denominator is not negative, throw an exception (or set the expected sample size equal to the population size). A detailed derivation of the expected $n$ calculation is here: https://github.com/pbstark/S157F17/blob/master/audit.ipynb
The result is $$ n \ge \frac{\log \alpha}{\log \left ( 1 - \frac{\mu}{2\gamma} \right ) -\omega_1 \log \left ( 1 - \frac{1}{2\gamma}\right ) -\omega_2 \log \left ( 1 - \frac{1}{\gamma}\right ) -\upsilon_1 \log \left ( 1 + \frac{1}{2\gamma}\right ) -\upsilon_2 \log \left ( 1 + \frac{1}{\gamma}\right )}, $$ where $(\omega_1, \omega_2, \upsilon_1, \upsilon_2)$ are the expected rates of 1 and 2 vote overstatements and 1 and 2 vote understatements, respectively.
Thank you Philip! The formatted version of that LaTeX math is:
Reconciliation of Philip's formula and my formula:
Philip's formula and mine agree except for the first term in the denominator on the RHS of Philip's formula. Where his formula has log(1-mu/2gamma) mine has must -mu/2gamma. For small values of mu/2gamma these two quantities are close to equal. Since for any mu and any gamma
-mu/2gamma >= log(1-mu/2gamma),
it follows that Philip's formula will yield a larger stopping size n than my formula will.
Thanks, Philip! I've incorporated your Python code for your new equations at https://github.com/nealmcb/audit_cvrs/blob/master/audit_cvrs/rlacalc.py I also included unit tests, code to check validity of the arguments etc..
I haven't compared the results to those from Stephanie's math yet.
I note for others that as we've discussed offline, your new function KM_Expected_sample_size()
which implements your math from above addresses many of our requirements, but it is not quite a drop-in replacement for nminFromRates()
. By itself it does not support the roundUp
options. In rlacalc.py
I also included KM_Expected_sample_size_rounded()
which combines your function with a more robust version of the loop in nminFromRates()
which you use to implement the rounding calculations at https://www.stat.berkeley.edu/~stark/Vote/auditTools.htm.
I think these functions are a good basis for the Java code we need for this issue.
For the record, the current algorithm in ColoradoRLA-1.0.0 is in CountyContestComparisonAudit.java
and works like this:
def recalculateSamplesToAudit(num_audited, a, gamma, m, o1, o2, u1, u2):
nm = nmin(a, gamma, m, o1, o2, u1, u2)
return ceil(nm * (1 + ((o1 + o2) * 1.0 / num_audited)))
where the number of ballots audited so far (thru a round we'll call n
) is num_audited
. The tool thus sets the size of round n+1
so that it will cover all discrepancies seen so far, in that sample, and multiplies the result by a small fraction based on just the overstatements (treated alike), without further considering the margin.
This is used to calculate estimated_samples_to_audit
.
There are some corner cases of unusual inputs, encountered during testing, in which the calculations from the
nminFromRates()
function in https://www.stat.berkeley.edu/~stark/Vote/auditTools.htm are invalid. An interim robust-but-inaccurate algorithm which leverages the robust and accurate results from thenmin()
function is in use now for estimating ballots-to-audit for future rounds.Replace this interim algorithm with a better one, which takes into account the currently observed discrepancy rate.
See also #422 , #482