Open akazukin5151 opened 1 year ago
I really like how comprehensive your post is, @akazukin5151! Thank you! 👍🏼
I've used numpy
before, but don't have the deep technical knowledge needed to fix the issue you described, hope others can chime in.
That said, I think a key, basic step we should take is to be upfront about this (possibly) rare issue in the documentation. Speaking of which, the README should probably be expanded into better documentation for this reference code. 😄
I don't think it's fixable in numpy. I did some searches for "numpy exact decimal", "exact fraction", and "exact rational" with nothing relevant. Python does have a decimal and fraction types, but you lose the speed and terseness of numpy.
This is indeed a big change, so I agree that we should at least document it.
Thanks, @akazukin5151 I think issues like this are particularly important for reference implementations like this repository.
Would you (and others) consider this to be a rare, but critical issue? Or might it happen quite often?
Yeah we probably can't directly fix this in numpy. But could there be a workaround for just this part using Python's internal functions? Would using Python itself (instead of an external library) make any difference?
I think it is not rare but also not that critical. I have no evidence, but considering that you can observe floating point behavior with 0.1 + 0.2 = 0.30000000000000004
, I think it is not rare. But averaging many situations together, it is "probably good enough". It depends on your use case. Actual governments using this will need all edge cases to be covered, but not everyone is a government. This example started because there was a tie, which should be rare in the real world. But all you need is a single tie and mistakes will cascade. Electing the wrong candidate does sound pretty serious to me, after all the point of investing in alternate electoral systems is better accuracy
My understanding of floating point arithmetic is that if you really need exact guarantees, you cannot rely on floating points [1]. So there's no point in using "a different float", it just makes different mistakes.
The decimal or fraction modules is in the Python standard library ("internal functions"/"using Python itself"), and they will fix it. But using plain Python floats instead of numpy floats won't make a difference.
Reading my post again, I think pandas/numpy is actually "more right" by using 6.6
instead of 6.599999999999999
. But ultimately the ideal solution is to use fixed/infinite precision decimals in both Python and JS.
At least Python has it in the standard library, but JS would need an external library like decimal.js. On the other hand, the JS code is written with loops so I think it would mostly be a drop in replacement. The difficulty in fixing it in Python is because the code is written in a vectorized pandas style, which has to be rewritten to loops for the decimal type.
I'm surprised how much I've written. The code is not that long (simple implementation was one of the factors in choosing STAR after all), so it's possible to spend a week discussing it when it can be fixed in a few days. (I don't have time right now but I might consider later)
Voter satisfaction efficiency (VSE) appears to use floats as well, I wonder how much that affects results. Hopefully not significant enough!
First, floats are inherently imprecise and should not be used in applications where they need to be tested for equality, which is what ties are. The scores, number of candidates, number of winners, number of voters are all integers, so this algorithm is inherently based on rational numbers and should calculate using those ideally.
Second, the line:
#Select winner
w = weighted_scores.sum().idxmax()`
doesn't have any way to deal with ties. If there are multiple candidates with the same score at this point, it will always pick the first one in the list(?) Which could lead to a systemic bias if the candidates are arranged alphabetically. How should ties be handled?
If I print the results of each round, it looks like these are actually tied, even in the floating-point representation:
Weighted scores:
0 5.80000000000000159872
1 4.59999999999999964473
2 6.59999999999999964473 [tied]
3 5.59999999999999964473
4 4.80000000000000071054
5 6.59999999999999964473 [tied]
6 6.59999999999999964473 [tied]
7 5.99999999999999911182
dtype: object
Winner: 2
Weighted scores:
0 4.20000000000000017764
1 3.40000000000000079936
3 2.89999999999999991118
4 2.99999999999999955591
5 4.50000000000000000000
6 4.69999999999999928946 [highest, no ties]
7 3.79999999999999982236
dtype: object
Winner: 6
Weighted scores:
0 1.80000000000000004441
1 1.40000000000000013323
3 1.89999999999999991118
4 1.60000000000000008882
5 1.70000000000000017764
7 2.39999999999999991118 [highest, no ties]
dtype: object
Winner: 7
[2, 6, 7]
If I convert the code to use Python Fraction
objects, the results are exact, but the tie still results in the same outcome:
Weighted scores
0 29/5
1 23/5
2 33/5 [tied]
3 28/5
4 24/5
5 33/5 [tied]
6 33/5 [tied]
7 6
dtype: object
Winner: 2
Weighted scores
0 21/5 = 4.20
1 17/5 = 3.40
3 29/10 = 2.90
4 3
5 9/2 = 4.50
6 47/10 = 4.70 [highest, no ties]
7 19/5 = 3.80
dtype: object
Winner: 6
Weighted scores
0 9/5 = 1.80
1 7/5 = 1.40
3 19/10 = 1.90
4 8/5 = 1.60
5 17/10 = 1.70
7 12/5 = 2.40 [highest, no ties]
dtype: object
Winner: 7
[2, 6, 7]
But I think both issues need to be dealt with. Ties need to be handled somehow, and floats should not be used.
@akazukin5151
In JS, this is 6.599999999999999. In my Python debugger and script, it is also 6.599999999999999. However when pandas actually sums it up, it becomes 6.6.
Here's what my debugger session looks like:
> /home/me/star_pr.py(187)Allocated_Score() -> w = weighted_scores.sum().idxmax() (Pdb) weighted_scores.sum() 0 5.8 1 4.6 2 6.6 …
It's not really 6.6; it's actually 6.599999999999999644728632119950… They're just rounded for printing as a convenience (because floats are "close enough" data type). If you need to print it more precisely, though, you can add digits:
print("\nWeighted scores:\n", weighted_scores.sum().map('{:.20f}'.format))
If there are multiple candidates with the same score at this point, it will always pick the first one in the list(?) Which could lead to a systemic bias if the candidates are arranged alphabetically. How should ties be handled?
This hasn't been added to the python implementation yet, but how it's done in the JS version is we make a list of all candidates with the scores equal to the max score and pick randomly from that. Currently we don't have a better ties protocol for this method.
If I convert the code to use Python
Fraction
objects, the results are exact, but the tie still results in the same outcome:
Interesting, didn't know that existed. It looks like there's a Fraction library for JS too
If there are multiple candidates with the same score at this point, it will always pick the first one in the list(?) Which could lead to a systemic bias if the candidates are arranged alphabetically. How should ties be handled?
This hasn't been added to the python implementation yet, but how it's done in the JS version is we make a list of all candidates with the scores equal to the max score and pick randomly from that. Currently we don't have a better ties protocol for this method.
Ok. In my single winner implementation I have three options:
I would suggest doing it the second way: Specify that the list of candidates be randomized before tabulating the election, and then favor the first one in the list in case of ties, so it's deterministic but fair?
Here I've got the version with Fraction objects: https://github.com/Equal-Vote/starpy/compare/main...endolith:starpy:I_got_98.99999_problems
Fraction
objects are the way to go. Then, e.g., you could add the very valuable (for both internal documentation purposes and for catching logic errors) assertion that, at the start of a round the sum of the ballot weights is exactly equal to the quota times the number of winners remaining to be picked.
IMO it would be good to get away from numpy/Pandas here regardless. Their use greatly reduces the audience able to read the code, and buys approximately nothing worth getting: the code is not complex and the computational expense is minor, Simple pure-Python loops are more than up to the task of specifying the exact computation clearly.
Using Fraction objects removes the vectorization anyway.
Though the ballots are inherently a 2D object which pure Python doesn't really support, so removing numpy might actually make it less readable?
I agree with tim-one.
I used Fraction objects in my implementation:
https://github.com/larryhastings/starvote
It does slow things down a bit; I repurposed one of the bigger test elections I could download from https://star.vote/ to test Bloc STAR and Allocated Score, and there's a perceptible pause before it produces results. I don't think anybody would actually care about the performance impact, especially if you tell them "it's a little slower because it's 100% accurate".
Though the ballots are inherently a 2D object which pure Python doesn't really support, so removing numpy might actually make it less readable?
A ballot is a mapping of candidates to scores, which is very clearly expressed as a Python dict
. Then, e.g., there's no added obfuscation due to needing to remember which axis means "candidates" and which "scores". But rather than a dict
, it's clearer still to use Ballot
objects, which contain a candidate-to-score dict. Then also have a score(self, cand)
method, which, applies the ballot's current weight (a member of the Ballot
object) to the dict's score. Sticking to that, then it's also impossible to "forget" to use weighted scores instead of raw scores.
That's a bit slower than 2D float array indexing too, but IMO also clearer on all counts. BTW, note that a dict
can be indexed directly by a candidate's name (as a string) - no need to map candidates to "little integers", another little layer of obfuscation needed when working with arrays.
[...] it's clearer still to use
Ballot
objects, which [...] also have ascore(self, cand)
method
FWIW, I didn't use Ballot
objects in my starvote
library. I didn't want to inflict a custom object on my users, and I felt like I didn't need the abstraction internally.
After all, the score
method described above is a one-liner:
return self.ballot_dict.get(cand, 0)
And I have functions to compute scores on iterables of dicts. So I don't have that many places that directly access a score. I just manually operate on the dict
and hand-inline the get
method call as needed.
But I would say, either operate 100% of the time on dict
objects, or operate 100% of the time on custom Ballot
objects. Consistency is important--don't make your tabulation code handle both.
p.s.
it's clearer still to use
Ballot
objects, which contain a candidate-to-score dict
I'd actually opt for a Ballot
that inherited from dict
. The only difference I can think of between a Ballot
object and a dict
is the addition of the score
method. I don't prefer "is-a" over "has-a" often but in this case it seems reasonable.
After all, the
score
method described above is a one-liner:return self.ballot_dict.get(cand, 0)
I don't understand how that can work unless you're also mutating the dict to keep on reducing the scores by the ballot's reweighting factor. My Ballot.score()
method:
def score(self, cand):
assert cand in candset
return self.weight * self.raw.get(cand, _zero)
where _zero
is a private module constant for Fraction(0)
. The dict (raw
) is never mutated after construction, so, e.g., it's fine if a million ballots with the same scores all use the same dict object.
I'd actually opt for a
Ballot
that inherited fromdict
Why? I'm aiming to maximize clarity, not cleverness :wink:. My Ballot
objects also, e.g., have a merit(self, winners)
method to score a ballot against a candidate set of winners, for use in a non-sequential RRV election method.
Ballots are fundamental to all election schemes - modeling them with a Ballot
object is just screamingly natural to me.
BTW, note that a dict can be indexed directly by a candidate's name (as a string) - no need to map candidates to "little integers", another little layer of obfuscation needed when working with arrays.
For deterministic replication in case of ties, I think pre-randomizing the list of candidates and assigning integers to them would be more reliable/trustworthy. Especially since names can have special characters, multiple candidates can have the same name, etc. Python has ordered dicts now, so we could count on that ordering for the tiebreaker(?), but I think explicit numbering is more clear.
I'm aiming to maximize clarity, not cleverness 😉.
Inheriting from a conceptually-similar established type is good for both clarity and minimizing bugs, no?
Sort of related:
https://electowiki.org/wiki/Talk:Allocated_Score#Frustrating
What the page desperately needs is an example. Say, a simple fake election, with 3 to 5 candidates and 1000 voters (or scores adding up to 1000 maybe?), and going step by step through each round, showing the entire process.
Sadly, it doesn't have that. It has Python code, which I for one find totally unhelpful because it involves library functions and I don't know what they do. And anyway, one shouldn't have to read code to understand an electoral system. Not everybody can read code.
I don't understand how that can work unless you're also mutating the dict to keep on reducing the scores by the ballot's reweighting factor.
For electoral systems with weighting I internally create a second dict and keep the two paired with each other. I often need other random other bits of per-ballot data, e.g. a weight, for RRV a total of maximum_score + score_contributed_so_far
, for SSS the number of stars the ballot has yet to spend. Rather than festooning a Ballot
object with all that, I lash all those disparate objects together into a small array, the contents of which are ad-hoc depending on the needs of the election system. This all seemed fine, and was okay to work with, but if I got a PR "cleaning up" this approach in favor of a custom object or something I suppose I'd consider it.
For deterministic replication in case of ties, I think pre-randomizing the list of candidates and assigning integers to them would be more reliable/trustworthy.
I don't think the integers thing is necessary, just pass in a list of candidates. My starvote
library supports that; if you pass in a pre-randomized list of candidates to the predefined_permutation_tiebreaker
, the election is 100% deterministic.
I had a "register candidate" step in an early prototype of my library but I abandoned it. I felt like a lot of that sort of data validation should live external to my tabulation code--the code compiling the ballots should do its own validation. And removing it made my library easier to use.
FWIW I'd handle that by creating a Candidate
object, which contained both the name and a serial number, and make their __hash__
and __str__
incorporate both those values. Then I'd use those Candidate
objects in the ballots.
think pre-randomizing the list of candidates and assigning integers to them would be more reliable/trustworthy. Especially since names can have special characters, multiple candidates can have the same
I don't want to get too deep in largely irrelevant weeds here. Nobody is going to run an election with exactly the same names for multiple candidates. They're going to add some marking to distinguish them. Fine. A Ballot
object can use the same markings in its string names. "Special characters" are also irrelevant - Python strings can hold anything Unicode can express.
This is my find-winner-and-break-ties code:
c2s = {c : sum(b.score(c) for b in self.ballots)
for c in avail}
highest = max(c2s.values())
winners = [c for c, s in c2s.items() if s == highest]
if len(winners) > 1:
print("tied", winners)
winner = min(winners, key=lambda c: cands.index(c))
print("winner", winner)
cands
is a list of all candidates. If there's a tie for highest score, the winner is whoever appears first in that list. It's up to the user to supply that list, in whatever order they want.
If they want, e.g., an election official to predetermine a preference order for ties, fine. If they want to use random.shuffle()
, fine too. Or, in case of ties, they could temporarily pause the election, wait until the next business day, and seed a cryptographic hash function with the open, high, and low values reached by SPY (the most heavily traded security in the world) during the first trading hour. That can then be used in various ways to assign a unique totally ordered value, but wholly unpredictable in advance, to each candidate. Etc. :wink:.
Inheriting from a conceptually-similar established type is good for both clarity and minimizing bugs, no?
Depends on the context. In this context I emphatically do not want the scoring dicts to mutate after creation. Inherit from dict, and then all the ways of spelling "mutate this dict" come along for the ride (unless explicitly overridden in the subclass), Yes, a ballot has a dict,, but it also has a weight, and only weighted scores are of any use. Stick to using the score()
method and it's not possible to make a mistake.
Seems like we've already decided removing numpy (and pandas too?), but I'll toss in my two cents as well. There's always one good reason to remove a dependency in Python - that's one less thing beginners have to trip over because of the horrible dependency management in the ecosystem.
Sadly, it doesn't have that. It has Python code, which I for one find totally unhelpful because it involves library functions and I don't know what they do. And anyway, one shouldn't have to read code to understand an electoral system. Not everybody can read code.
I had this problem too, I had to run the code in a debugger to understand what pandas is doing, because it's really terse and the pandas docs are also relatively sparse on examples. Terser code might make the method look simpler, but all it does is to make it harder to understand
The discussion about Ballot objects seems a bit off topic but I'll comment as well. I don't use Python for my personal needs, so I don't have a horse in this race. My general advice is to avoid inheritance in favour of composition. I would favour having a Ballot class/struct/record over a plain dict. But that's also because in the language I use, structs are "just data" with no overhead after compilation (not even heap allocation overhead - it's stored on the stack). It also gives type safety at compile time (ensuring an ill-formed dict cannot be passed in), but that involves boilerplate in Python.
Nevertheless, a class/struct signals intent that some data (and behaviour) is semantically grouped together. I think that's a net benefit as long as you aren't making a spagetti mess.
I would favour having a Ballot class/struct/record over a plain dict.
Because these are weighted ballots, a plain dict is useless unless you're endlessly mutating all the dict's values to reflect the latest reweighting. So you either maintain a list of ballots and a list of corresponding weights "in parallel but implicitly", or you do a sane :wink: thing and recognize up front that - no - a ballot is not just another name for a dict. It's more than a dict.
To me, it makes no more sense to make a ballot a subclass of dict
than to make it a subclass of Fraction
(because it's equally as much "a rational" (weight) as it is "a dict").
Keep it simple and obvious. Larry said he "didn't want to inflict a custom object on my users", but nothing says the class needs to be exposed to users at all. As a developer, it's handy for me to construct example elections by creating Ballots
by hand, but users will almost certainly read ballot data out of some kind of structured text file, and have no need to know anything about how the code stores them.
Terser code might make the method look simpler, but all it does is to make it harder to understand
Agreed but with a qualification: it depends on lot on background. For example, one of the all-time champs for brevity is APL, which makes Pandas look grossly verbose. Actual APL experts find this brevity helpful, but APL code is flatly incomprehensible to everyone else - by far the bulk of the world.
It's not necessarily the case that the terser code is simpler either. For example, to locate scores at the cusp it does a running cumulative sum. But after sorting the ballots in order of the scores they give to the winner, it's simpler to iterate:
bisect.bisect_left()
) to find the leftmost ballot with the same score.That's "wordier" for sure, but each step is simple to understand on its own, and quite explicit. As a bonus, the more seats there are to fill, the more likely mounds of ballots will be thrown away in step 3, making the code faster and faster as the number of ballots remaining to look at decreases (whereas the Pandas code spends an ever-higher percentage of its time computing useless 0 scores for "conceptually - but not actually - discarded" ballots).
Fixed by #303, closing
Edit: wait, that was for a different repo...whatever I'll use that
So this isn't fixed in starpy?
Sorry yea this was fixed in typescript, haven't gotten around to python yet.
I think issue was closed by mistake - hence I am re-opening the issue.
See "fractions" solution at https://github.com/larryhastings/starvote.
On a separate note / topic (maybe I should open another issue / topic): Larry's solution is so much superior to our current solution (not only fractions) - hence I propose to request Larry's permission to refresh our STARPY solution using Larry's code as a foundation.
TLDR: the 1000th example of why floating point arithmetic is bad, in an excessively long post
Running the same election with starpy (Python) and Equal-Vote's star-core (Javascript) gives different results.
I am using these ballots from the star-core test cases, for STAR-PR:
https://github.com/Equal-Vote/star-core/blob/9a554932e4fc7ae9d6eb295fa1076437c812f5ce/src/Tests/pr.test.js#L105-L122
After spending an hour staring at two different debuggers simultaneously, I think the reason is because pandas in starpy somehow has different floating point behavior, causing a different candidate to be elected in the first round. This cascades throughout the end and we get a different election result overall.
Steps to reproduce
The ballot data is copied from star-core (JS) tests as linked above
The answer I get is
[2, 6, 7]
. This is 0-indexed so candidates 3, 7, and 8 are elected. This does not match the JS tests, where candidates 1, 3, and 7 are elected:https://github.com/Equal-Vote/star-core/blob/9a554932e4fc7ae9d6eb295fa1076437c812f5ce/src/Tests/pr.test.js#L129
What is happening
Let's look at the normalized ballots. Each row is a voter and each column is a candidate. There are 15 voters and eight candidates for a 3 seat election:
The first problem happens in column 2. We are summing to find the candidate with the highest score, so for column 2, we do this:
0.6+0.8+0.4+0.2+0.2+0.4+0.6+0.4+0.0+0.8+0.6+0.6+0.0+0.2+0.8
.https://github.com/Equal-Vote/starpy/blob/77e00d97ba6587b04a6c9aa0d83d7c5de38f4405/starpy/Allocated_Score.py#L21
In JS, this is
6.599999999999999
. In my Python debugger and script, it is also6.599999999999999
. However when pandas actually sums it up, it becomes6.6
.Here's what my debugger session looks like:
Hence it is not an artifact of pandas/numpy rounding for printing.
Compare the sums with the sums in JS (emphasis mine):
(Setting a breakpoint here: https://github.com/Equal-Vote/star-core/blob/9a554932e4fc7ae9d6eb295fa1076437c812f5ce/src/StarResults.js#L442)
For comparison, normal Python sums column 5 to
6.6
, and column 6 to6.6000000000000005
. So this is a problem with pandas/numpyThe consequences is that indices 2, 5, and 6 gets the same value of
6.6
in Python, but in JS they are all different. In JS, the highest score belongs to candidate 6, but since there is a tie in Python, it arbitrarily selects index 2.Python elects index 2 then 6. JS elects index 6 then 2. But the error has cascaded too much for the final seat. The out of order election resulted in different re-weightings. See the debugging values:
Raw data (hopefully I transcribed it correctly)
The sum of scores at the end of the 2nd round is completely different, causing Python to elect candidate 8 and JS to elect candidate 1.
What can we do about this?
Some projects such as tallystick offers the ability to use exact representations for rational fractions, or fixed point decimals. (For example, Scottish Councils use 5 d.p. for STV)
I'm not engaged in the STAR project, but I think if the code is supposed to be a "gold standard" then it might be worth it to show that it is rigorous, production, and real world ready.
I know it's not easy to change something so fundamental, and I totally understand because I haven't bothered with more exact decimals in my own project too. But at least people can see this issue and evaluate this for their own needs. I might not need this issue to be fixed, but hopefully it will save time for someone else who does.
(I stumbled on to this issue, because I was going to use the JS tests to validate my own implementation. This has implications for #2 and #9 as the JS tests couldn't be simply copied to Python)
Misc
requirements.txt
statespandas>=1.3.4
)requirements.txt
statesnumpy>=1.21.4
)This issue is appropriate for this repo or the star-core repo, but I think this repo the most appropriate because only pandas shows this behavior
See also