Open markshannon opened 1 year ago
Is this done now, with the pep-709 implementation as it landed?
No, PEP 709 only handles list/set/dict comprehensions, not generator expressions.
Unless we can implement LIST_TO_TUPLE
in a way that doesn't double the necessary space, I think memory might be an issue for large outputs.
PySequence_Tuple
also needs to perform a number of copies. It is mainly a matter of luck whether an additional allocation is needed at the end.
The main difference is likely to be the number of resizes building the list.
PySequence_Tuple
doubles, but PyList_Append()
only scales by 9/8.
1/n
allocations and copying one pointer is going to be much cheaper than a round trip between PySequence_Tuple
, the tower of gen_iternext, gen_send, ...
and the interpreter.
With a growth factor of 9/8
we need 6 reallocations per doubling, so that's ~10 additional copies per element for a large tuple.
I suspect that will still be faster than the round-trip through PySequence_Tuple
, but we should measure it.
We can look to change the growth factor for lists in another issue.
Consider the following function containing a "tuple comprehension":
Which produces the following code:
What we would like is:
For list comprehensions, we can use escape analysis and inlining to remove the overhead. @carljm is working on doing just that.
However, the above "tuple comprehension" is resistant to escape analysis as
tuple
could hypothetically be anything. We could use something like the approach described in @vstinner's FAT Python and convert:into:
where
LIST_TO_TUPLE
is a VM instruction, not a call.With escape analysis and inlining, we should get the ideal code, with the small prefix:
We would expect the tier 2 optimizer to eliminate the prefix.
The above optimization also applies to
any
andall
as well astuple
. The potential for performance improvements is even greater in those cases as they tend to not exhaust the generator.