astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
30.73k stars 1.02k forks source link

New Rule: Prefer list comprehension over generator comprehensions to create tuples #11839

Open Avasam opened 2 months ago

Avasam commented 2 months ago

I was recently working on some bits of codes where most of my data had to be "readonly" (so I'm using immutable types like frozen dataclasses, frozensets, tuples, etc.) but also using plenty of comprehensions. Which made me wonder, since there's no "tuple comprehension" in Python, how I should be writing this code. I did a bit of performance testing, and here's the results:

import sys
from timeit import timeit

print(sys.version)
big_list = ["*"] * 99

def foo(value: str): return value

def test_list_comprehension():
    return [foo(value) for value in big_list]

def test_tuple_from_list_comprehension():
    return tuple([foo(value) for value in big_list])

def test_tuple_from_generator_comprehension():
    return tuple(foo(value) for value in big_list)

def test_unpack_generator_comprehension():
    return (*(foo(value) for value in big_list),)

print(
    "test_list_comprehension",
    timeit(test_list_comprehension),
)
print(
    "test_tuple_from_list_comprehension",
    timeit(test_tuple_from_list_comprehension),
)
print(
    "test_tuple_from_generator_comprehension",
    timeit(test_tuple_from_generator_comprehension),
)
print(
    "test_unpack_generator_comprehension",
    timeit(test_unpack_generator_comprehension),
)
3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
test_list_comprehension 6.4194597
test_tuple_from_list_comprehension 6.9672235
test_tuple_from_generator_comprehension 8.996260200000002
test_unpack_generator_comprehension 11.207814599999999
3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]
test_list_comprehension 5.656617900000128
test_tuple_from_list_comprehension 6.026029500000277
test_tuple_from_generator_comprehension 9.207803900000272
test_unpack_generator_comprehension 10.375420500000018

Unsurprisingly, the difference is even greater in 3.12 with inline list comprehension.

Because of the tuple, the generator is immediately iterated, so you get no benefit from its "lazyness". This is probably true for other stdlib collections that don't have a comprehension syntax, tuple is just the only one I can think of atm.

For this reason, I'm asking for a performance rule with an autofix that transforms code like this:

tuple(a for a in b)

into

tuple([a for a in b])

Which, unless I'm missing something, is free performance whilst staying readable and pythonic.

It seems this would fit well in the flake8-comprehensions or refurb family of rules.

tdulcet commented 2 months ago

Using your script, I see less of a difference on Linux with CPython:

3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0]
test_list_comprehension 3.4234498779999853
test_tuple_from_list_comprehension 3.7473174160000156
test_tuple_from_generator_comprehension 4.684340659999975
test_unpack_generator_comprehension 4.938177772000017
3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
test_list_comprehension 5.881427110000001
test_tuple_from_list_comprehension 6.184221321999999
test_tuple_from_generator_comprehension 6.949574859000002
test_unpack_generator_comprehension 7.213964431000001
3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
test_list_comprehension 5.159386299999994
test_tuple_from_list_comprehension 5.68906659999999
test_tuple_from_generator_comprehension 6.2835374
test_unpack_generator_comprehension 6.521585700000003
3.6.9 (default, Dec  8 2021, 21:08:43)
[GCC 8.4.0]
test_list_comprehension 5.325352799999997
test_tuple_from_list_comprehension 5.670514699999998
test_tuple_from_generator_comprehension 6.860152300000003
test_unpack_generator_comprehension 7.0944126999999995
2.7.17 (default, Feb 27 2021, 15:10:58)
[GCC 7.5.0]
('test_list_comprehension', 5.888335943222046)
('test_tuple_from_list_comprehension', 6.135804891586304)
('test_tuple_from_generator_comprehension', 6.965441942214966)

But much more of a difference with PyPy:

3.9.18 (7.3.15+dfsg-1build3, Apr 01 2024, 03:12:48)
[PyPy 7.3.15 with GCC 13.2.0]
test_list_comprehension 0.2822986920000403
test_tuple_from_list_comprehension 0.40187594900010026
test_tuple_from_generator_comprehension 0.9802658359999441
test_unpack_generator_comprehension 1.0730282659999375
ivanychev commented 2 months ago

I think tuple-from-list comprehension approach will lead to to 2x higher peak memory consumption, won't it?