SchulzeSTV @ 10 candidates, 3 winners

bradbeattie commented 14 years ago

Slows down to a crawl.

bradbeattie commented 14 years ago

One possibility is to remove any candidates that no one prefers.

    input = [
        { "count":1, "ballot":{"A":9, "B":1, "C":1, "D":9, "E":9, "F":2, "G":9, "H":9, "I":9, "J":9 }},
        { "count":1, "ballot":{"A":3, "B":2, "C":3, "D":1, "E":9, "F":9, "G":9, "H":9, "I":9, "J":9 }},
        { "count":1, "ballot":{"A":9, "B":9, "C":9, "D":9, "E":1, "F":9, "G":9, "H":9, "I":9, "J":9 }}
    ]
    output = SchulzeSTV.calculate_winner(input, 5, "ranking")

could become

    input = [
        { "count":1, "ballot":{"A":9, "B":1, "C":1, "F":2, "G":9, "H":9, "I":9, "J":9 }},
        { "count":1, "ballot":{"A":3, "B":2, "C":3, "F":9, "G":9, "H":9, "I":9, "J":9 }},
        { "count":1, "ballot":{"A":9, "B":9, "C":9, "F":9, "G":9, "H":9, "I":9, "J":9 }}
    ]
    output = SchulzeSTV.calculate_winner(input, 3, "ranking")

which takes a lot less time. Seems like a poor patch hiding the real underlying performance issues.

bradbeattie commented 14 years ago

I think this chunk could be extracted:

    # Generate the list of patterns we need to complete
    completion_patterns = []
    for i in range(0,len(other_candidates)):
        for j in range(0, i+1):
            completion_patterns.append(list(set((pattern[0]) for pattern in itertools.groupby(itertools.permutations([2]*(len(other_candidates)-i)+[1]*(j)+[3]*(i-j))))))
    completion_patterns = [item for innerlist in completion_patterns for item in innerlist]

It runs every time the function does, but always generates the same data.

bradbeattie commented 14 years ago

Testing with

clear; nosetests tests/test_schulze_stv.py; nosetests tests/test_schulze_stv_performance.py -s

to ensure the validity of the results while boosting performance.

bradbeattie commented 14 years ago

Schulze's implementation in C works fine at 10 candidates, 5 winners (under 1 second to process). It chokes on 15 candidates, 7 winners though.

bradbeattie commented 14 years ago

The most recent modification generates unique lexicographic completion patterns without generating all possibilities. 10 candidates / 9 winners runs fine now. 10 candidates / 5 winners runs in 8 seconds. Will continue to look for performance optimizations.

bradbeattie commented 14 years ago

Tried shifting the adjacency matrix implementation over to pygraph's max flow algorithm (certainly better from a code-reuse standpoint). 10c5w runs in 24 seconds, a 300% increase, pretty much all of which is spent inside their implementation of Edmonds Karp.

Profiling the code (nosetests --with-profile) confirms that this is the major killer.

         3516555 function calls (3216445 primitive calls) in 8.119 CPU seconds

   Ordered by: internal time, call count
   List reduced from 330 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9993    4.565    0.000    6.659    0.001 pygraph/algorithms/minmax.py:310(maximum_flow)
   519680    0.596    0.000    0.854    0.000 pygraph/mixins/labeling.py:66(edge_weight)
   316966    0.546    0.000    0.546    0.000 pygraph/classes/digraph.py:59(nodes)
     1843    0.369    0.000    0.386    0.000 schulze_stv.py:128(__proportional_completion_round__)

bradbeattie commented 14 years ago

Reverting back to the capacity matrix implementation and running the profiling yields the following output:

         809981 function calls (509871 primitive calls) in 2.932 CPU seconds

   Ordered by: internal time, call count
   List reduced from 332 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    35777    1.480    0.000    1.480    0.000 schulze_stv.py:268(__bfs__)
     9993    0.346    0.000    1.910    0.000 schulze_stv.py:243(__edmonds_karp__)
325075/30607    0.342    0.000    0.342    0.000 schulze_stv.py:93(__unique_permutations__)
     1811    0.337    0.000    0.353    0.000 schulze_stv.py:131(__proportional_completion_round__)

Could this just be a matter of the C-implementation being faster than the Python or is there something algorithmically different with how I've coded it? Hrm...

bradbeattie commented 13 years ago

The edge-weight cutoff has produced a significantly faster algorithm as we don't need to waste time on max-flow iterations that will simply result in 0. I can't guarantee that this performance tweak will produce the same result in every case, but it passes all current functionality tests. Might want to consider bringing in Tideman's A0X set.

bradbeattie / python-vote-core

SchulzeSTV @ 10 candidates, 3 winners #11