Lattay commented 4 years ago

Hi, I just came across this repo and thought it was cool but the first thing I read was the group_by snippet and I could not make a comment. So the problem is this snippet have a quadratic time complexity for no reason, hence it is terribly slow. Just look at this very basic benchmark:

from math import floor
from time import time
from random import random

def timeit(f):
    def wrapper(*args):
        t0 = time()
        res = f(*args)
        print(f'Timing for {f.__name__}: {time() - t0:2.2}s')
        return res
    return wrapper

@timeit
def group_by_linear(lst, fn):
    d = {}
    for el in lst:
        key = fn(el)
        if key in d:
            d[key].append(el)
        else:
            d[key] = [el]
    return d

@timeit
def group_by_quadratic(lst, fn):
    return {key: [el for el in lst if fn(el) == key] for key in map(fn, lst)}

for n in (10, 100, 1000, 10000):
    print(f'Test with {n} elements')
    lst = [random() * 10 for _ in range(n)]
    group_by_linear(lst, floor)
    group_by_quadratic(lst, floor)

Test with 10 elements
Timing for group_by_linear: 1.3e-05s
Timing for group_by_quadratic: 2.7e-05s
Test with 100 elements
Timing for group_by_linear: 3.2e-05s
Timing for group_by_quadratic: 0.0014s
Test with 1000 elements
Timing for group_by_linear: 0.00029s
Timing for group_by_quadratic: 0.11s
Test with 10000 elements
Timing for group_by_linear: 0.0023s
Timing for group_by_quadratic: 1.1e+01s

Basically with the quadratic version, if your list is n element long, you do n squared access and n squared fn calls, which is pretty inneficient. One liners list comprehensions are fine and all but it should not be abused.

Since this repo has an educational purpose I think it is important to show the right way of doing things. There is too many people saying "Python is slow" when in fact it is their algorithm that is slow.

Chalarangelo commented 4 years ago

This might be related to #201

Lattay commented 4 years ago

201 and my proposal both use the same algorithm, while #201 introduce a little know feature of python which is quite nice.

30secondsofcode commented 4 years ago

Duplicate of #201

Chalarangelo / 30-seconds-of-python

Unjustified quadratic complexity #213

201 and my proposal both use the same algorithm, while #201 introduce a little know feature of python which is quite nice.