faster-cpython / ideas

1.67k stars 49 forks source link

Get performance of async generators on a par with normal generators. #570

Open markshannon opened 1 year ago

markshannon commented 1 year ago

Async generators are implemented in a way that involves a Jenga tower of state machines, wrappers and special exceptions. I won't attempt to describe it further.

Instead I'll explain how they should be implemented.

Anyone interested in this should read https://github.com/faster-cpython/ideas/issues/448 first. https://github.com/faster-cpython/ideas/issues/448 explains how we extend the simple two-way exit of return and raise, to include yield.

Async generators extend the three-way exits of generators to a four-way exit. Async generators may exit by yielding, awaiting, returning or raising.

The first thing to note is that being awaited is implemented the same as yield. This is fine for generators (which can yield, but cannot be awaited), or coroutines (which can be awaited, but not yield). To avoid having to change that, we treat yielding in an async generator as a special operation, which I will call async yield. Currently async yield is implemented in the bytecode as

CALL_INTRINSIC_1      (_PyAsyncGenValueWrapperNew)
YIELD_VALUE 

First let's merge those into a single instruction: ASYNC_YIELD_VALUE.

In much the same way that YIELD_VALUE and RETURN_VALUE jump to different location in the caller, ASYNC_YIELD_VALUE will need to jump to yet another location. There is space for an 8 bit "async yield" offset to be added to the frame. Since the yield from loop is small, 8 bits will be sufficient offset from the yield target.

So we have:

To take advantage of this, we will also need to change the bytecode of the async for loop code to contain all the necessary offsets. Currently the inner SEND has the async_yield_offset, but the yield_offset is implicit in the exception handling table.