Hi,
I just came across this repo and thought it was cool but the first thing I read was the group_by snippet and I could not make a comment.
So the problem is this snippet have a quadratic time complexity for no reason, hence it is terribly slow.
Just look at this very basic benchmark:
from math import floor
from time import time
from random import random
def timeit(f):
def wrapper(*args):
t0 = time()
res = f(*args)
print(f'Timing for {f.__name__}: {time() - t0:2.2}s')
return res
return wrapper
@timeit
def group_by_linear(lst, fn):
d = {}
for el in lst:
key = fn(el)
if key in d:
d[key].append(el)
else:
d[key] = [el]
return d
@timeit
def group_by_quadratic(lst, fn):
return {key: [el for el in lst if fn(el) == key] for key in map(fn, lst)}
for n in (10, 100, 1000, 10000):
print(f'Test with {n} elements')
lst = [random() * 10 for _ in range(n)]
group_by_linear(lst, floor)
group_by_quadratic(lst, floor)
Test with 10 elements
Timing for group_by_linear: 1.3e-05s
Timing for group_by_quadratic: 2.7e-05s
Test with 100 elements
Timing for group_by_linear: 3.2e-05s
Timing for group_by_quadratic: 0.0014s
Test with 1000 elements
Timing for group_by_linear: 0.00029s
Timing for group_by_quadratic: 0.11s
Test with 10000 elements
Timing for group_by_linear: 0.0023s
Timing for group_by_quadratic: 1.1e+01s
Basically with the quadratic version, if your list is n element long, you do n squared access and n squared fn calls, which is pretty inneficient.
One liners list comprehensions are fine and all but it should not be abused.
Since this repo has an educational purpose I think it is important to show the right way of doing things.
There is too many people saying "Python is slow" when in fact it is their algorithm that is slow.
Hi, I just came across this repo and thought it was cool but the first thing I read was the group_by snippet and I could not make a comment. So the problem is this snippet have a quadratic time complexity for no reason, hence it is terribly slow. Just look at this very basic benchmark:
Basically with the quadratic version, if your list is n element long, you do n squared access and n squared
fn
calls, which is pretty inneficient. One liners list comprehensions are fine and all but it should not be abused.Since this repo has an educational purpose I think it is important to show the right way of doing things. There is too many people saying "Python is slow" when in fact it is their algorithm that is slow.