algorithm-archivists / algorithm-archive

A collaborative book on algorithms
https://www.algorithm-archive.org
MIT License
2.38k stars 356 forks source link

Stable marriage Python implementation is cubic #1005

Open pgdr opened 2 years ago

pgdr commented 2 years ago

Bug Report

Description

The Python implementation in the stable marriage chapter is worst case ϴ(n³), but should be O(n²).

I have attached an alternative implementation that runs in worst-case quadratic time. I can make a PR if you want to replace the current algorithm.

Steps to Reproduce

Run the code with N = 3000.

Expected behavior

Terminate within seconds.

Screenshots

N current new
0 0:00.11 0:00.08
100 0:00.10 0:00.07
200 0:00.14 0:00.09
300 0:00.18 0:00.13
400 0:00.28 0:00.16
500 0:00.24 0:00.20
600 0:00.90 0:00.27
700 0:00.71 0:00.35
800 0:01.38 0:00.43
900 0:01.95 0:00.55
1000 0:01.37 0:00.65
1100 0:04.69 0:00.80
1200 0:02.42 0:00.95
1300 0:04.20 0:01.15
1400 0:05.49 0:01.29
1500 0:09.23 0:01.55
1600 0:15.27 0:01.76
1700 0:09.13 0:01.93
1800 0:10.11 0:02.19
1900 0:11.64 0:02.27
2000 0:18.57 0:02.49
2100 0:22.63 0:02.89
2200 0:19.52 0:03.13
2300 0:26.70 0:03.43
2400 0:54.32 0:03.72
2500 1:07.84 0:04.16

Additional context


from random import sample as shuffle

def stable_matching(hospital_preferences, student_preferences):
    students = [s for s in student_preferences]
    hospitals = [h for h in hospital_preferences]
    proposals = {h: 0 for h in hospitals}
    unmatched_hospitals = [h for h in hospitals]
    student = {h: None for h in hospitals}
    hospital = {s: None for s in students}
    inrank = {
        s: {} for s in students
    }  # maps s to each hospital's s-ranking
    for s in students:
        for idx, h in enumerate(student_preferences[s]):
            inrank[s][h] = idx

    while unmatched_hospitals:
        h = unmatched_hospitals.pop()
        nxt = proposals[h]
        s = hospital_preferences[h][nxt]
        proposals[h] += 1
        # h proposes to its best student not yet proposed to
        if not hospital[s]:
            # s is available
            hospital[s] = h
            student[h] = s
        else:
            sh = hospital[s]
            rank_sh = inrank[s][sh]
            rank_h = inrank[s][h]
            if rank_h < rank_sh:
                # s dumps sh for h
                hospital[s] = h
                student[sh] = None
                student[h] = s
                unmatched_hospitals.append(sh)
            else:
                # s is taken
                unmatched_hospitals.append(h)
    return student

def _generate_instance(N):
    HOSPITALS = [f"h{i}" for i in range(N)]
    STUDENTS = [f"s{i}" for i in range(N)]

    hospital_preferences = {h: STUDENTS[:N] for h in HOSPITALS[:N]}
    student_preferences = {s: HOSPITALS[:N] for s in STUDENTS[:N]}

    for h in HOSPITALS[:N]:
        hospital_preferences[h] = shuffle(hospital_preferences[h], N)

    for s in STUDENTS[:N]:
        student_preferences[s] = shuffle(student_preferences[s], N)

    return hospital_preferences, student_preferences

if __name__ == "__main__":
    import sys

    hospital_preferences, student_preferences = _generate_instance(int(sys.argv[1]))
    #print(hospital_preferences)
    #print(student_preferences)

    M = stable_matching(hospital_preferences, student_preferences)
    for h in M:
        print(f"Hospital {h} + Student {M[h]}")

For Algorithm Archive Developers

Amaras commented 2 years ago

Hi! Thank you for your interest in the Algorithm Archive, as I see this is your first contribution to our repo. While I recognize our stable problem code is not the most efficient, I don't think it is a big problem just yet. Could you please provide a fair comparison between the two algorithms? This would mean:

  1. Checking that the results are the same in (nearly) all possible cases (as the current Python version should have the correct behaviour)
  2. Provide fair performance-comparing code, so we can use tools like hyperfine to compare the two version easily.

If you have done all that, I think you can open a PR, in which we'll do a code review. As the code you provide in the issue is confusing (in my opinion), we'll definitely need to improve it before we can merge it in.

pgdr commented 2 years ago

Checking that the results are the same in (nearly) all possible cases (as the current Python version should have the correct behaviour)

The results are identical. They both implement the Gale–Shapley Algorithm, and interestingly, any G–S algorithm implementation has a unique output (regardless of execution order). I have confirmed that the produced matching is the same in both algorithms.

Provide fair performance-comparing code, so we can use tools like hyperfine to compare the two version easily.

(e39) [~/.../python]$ hyperfine --warmup 3 "python sm.py 1000" "python stable_marriage.py 1000"
Benchmark 1: python sm.py 1000
  Time (mean ± σ):     745.5 ms ±   5.8 ms    [User: 721.9 ms, System: 23.2 ms]
  Range (min … max):   739.6 ms … 758.7 ms    10 runs

Benchmark 2: python stable_marriage.py 1000
  Time (mean ± σ):     45.998 s ±  1.857 s    [User: 45.985 s, System: 0.011 s]
  Range (min … max):   43.186 s … 49.517 s    10 runs

Summary
  'python sm.py 1000' ran
   61.70 ± 2.54 times faster than 'python stable_marriage.py 1000'

sm.py is the quadratic time algorithm attached above, and 1000 means that N = 1000 in every run.

They both used the same seed in all ten runs. If necessary, I can try to get hyperfine to set different seed in each run.

Amaras commented 2 years ago

The results are identical. They both implement the Gale–Shapley Algorithm, and interestingly, any G–S algorithm implementation has a unique output (regardless of execution order). I have confirmed that the produced matching is the same in both algorithms.

OK, good. We won't have to check too much, then.

The thing I don't really understand is where you pull stable_marriage.py accepting an argument, which we don't have in the AAA. If that's our code that you modified, you need to provide the modifications you made, because, right now, the code just can't accept more than 26 people due to the data representation we chose.

While in general I'd prefer fast code, in this case I'm not sure if that's necessary, as the AAA has a mainly pedagogical value, although it should definitely talk about performance in a dedication section.

EDIT: What I failed to mention is that we have to assume people don't have a background in programming or algorithms, so clarity is key, sometimes even at the cost of performance.

leios commented 2 years ago

This looks good to me. To be honest, the stable marriage chapter is one of the few that need a major overhaul. There does not seem to be any standard implementation that spans across different languages, so the python version is radically different than the Julia one (for example).

Right now, I find the current python implementation a bit harder to read than the one you provided, but we might need to work on it to standardize it across the other languages as well.

pgdr commented 2 years ago

you need to provide the modifications you made

Here is the modification that allowed me to run it for an arbitrary number of pairs

-def main():
+def main(num_pairs):
     # Set this to however many men and women you want, up to 26
-    num_pairs = 5

     # Create all Person objects
-    men = [Person(name) for name in ascii_uppercase[:num_pairs]]
-    women = [Person(name) for name in ascii_lowercase[:num_pairs]]
+    men = [Person(f"M{name}") for name in range(num_pairs)]
+    women = [Person(f"W{name}") for name in range(num_pairs)]

     # Set everyone's preferences
     for man in men:
@@ -136,4 +135,5 @@ class Person:

 if __name__ == '__main__':
-    main()
+    import sys
+    main(int(sys.argv[1]))

AAA has a mainly pedagogical value

I realize that, and I think that's a good thing. I just don't know if the current SM implementation is optimal in that regard either. I think the OOP based approach to a relatively simple algorithm makes it more difficult to see what's going on.

The stable marriage algorithm works as follows:

while some man is unmatched:
  propose to best woman you haven't proposed to yet
  woman accepts if unmatched or this man is better than her current

I think that an algorithm in the AAA should be as close in spirit as possible, to that one.

A different issue is that I think that more modern expositions of the stable matching problem uses "hospitals" and "students", rather than "men" and "women". But that's a different discussion.

stable marriage chapter is one of the few that need a major overhaul

I could also take a shot at that while I'm at it (if I'm creating a PR anyway).

Pps: I don't think it belongs under decision problems (that are typically yes/no). The Stable Marriage problem always has a solution. Perhaps it would be better suited under a flows & matchings category?

leios commented 2 years ago

Yeah, I agree with the points here. Also happy to change terminology to be stable matching instead of stable marriage (and also take down my youtube video on the topic). At the time, I had not seen the stable matching terminology. I wish I had because it kinda caused a bit of a rift in the community and I didn't know how to fix it.

I think I would ideally like to revamp the entire chapter to be more inline with current standards.

Amaras commented 2 years ago

@pgdr Yep, I'm okay with those modifications for benchmarking, although we don't typically read from stdin (because of an ongoing-ish effort for standardization of inputs and outputs).

I also agree that OOP might not be the best approach, although it was the approach I went for when I wanted to improve it (might have been misguided). It is also possible that output to stdout takes a long time, since you're writing 3000 lines (for N=1000). A quick test without output gave me around 30 seconds with the current version (N=1000), so about 3 times less than what you got, but still significant.

If both of you want a revamp of the chapter, we'll need to rewrite all the implementations anyway, so might as well do a compromise solution (your code, but a bit less confusing).

Amaras commented 2 years ago

After a "quick" profile, these two are the longest running functions:

python3 -m cProfile stable_marriage.py 1000
         250785690 function calls (250785602 primitive calls) in 955.653 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
[...]
238405442  425.530    0.000  425.530    0.000 stable_marriage.py:104(partner)
[...]
  1487000  484.683    0.000  909.195    0.001 stable_marriage.py:89(pick_preferred)

So the method call overhead seems to be part of the problem, since the time per call of those two functions is really low.

pgdr commented 2 years ago

After a "quick" profile, these two are the longest running functions

My point is that the worst-case complexity of the current implementation is ϴ(n³), so it's not really the method call overhead that's doing it, it's just the sheer number of operations. When N = 3000, we have something like 1010 many "operations", and if each "operation" takes at least a couple of CPU cycles, then time starts to fly rather quickly. So I don't think there's any point in profiling the implementation.

pgdr commented 2 years ago

Indeed, just looking at the number of function calls (for N = 3000):

Current implementation:

         2833248468 function calls (2833248370 primitive calls) in 527.844 seconds

Alternative implementation:

         62561677 function calls (62561636 primitive calls) in 12.986 seconds

That's 2833M vs 62M function calls.

Amaras commented 2 years ago

My point is that the worst-case complexity of the current implementation is ϴ(n³), so it's not really the method call overhead that's doing it, it's just the sheer number of operations.

Yeah, that's what I meant but didn't articulate properly. The time cost is dominated by method call (and profiling) overhead due to the number of operations. Seems like the mean time complexity is close to ϴ(n³) as well.

It's been a while since I computed complexity, so I'll have to take some time to do that on my own Okay, I see where that comes from: a linear search (pick_prefered) in a double loop... yep, obviously O(n³). With the current architecture, it will be basically impossible to go below O(n² log n), so we should not use it unless it's very superior in terms of clarity compared to any improvement.

We'll see when we get around to rewriting the chapter. If @leios is comfortable enough letting you do this chapter rewrite, then please do take a few days to do it, and we'll review both the new code and the new chapter fairly. Although that has historically been a very long process. Otherwise, @leios should (probably) get some AAA work done before Oct 15, so feel free to be part of the chapter review process, and contribute Python code when the chapter is merged.