Closed iritkatriel closed 1 year ago
It seems that about 11-12% of the code objects never access any consts. Maybe the compiler can detect this and not copy the consts in that case?
Note that we often have a consts array with just the docstring in this case. We considered in the past to move the docstring off the co_consts into a separate field.
We're going to replace PyTuple_GetSize()
with PyTuple_GET_SIZE()
and re-run the benchmarks.
We also have a few other ideas:
RETURN_NONE
opcode. We speculate that this is often the only constant used in a function (needs checking, all we know is that None
is by far the most common constant).INITIALIZE_CONSTANTS
instruction just before the first instruction that needs a constant. This requires some control flow analysis. It would be useful in functions that have a fast path that doesn't use constants and a slow path (e.g. cache miss or error) that does.With PyTuple_GET_SIZE it's 3% slower.
Odd. That sounds like benchmark noise.
Is there an option to force re-running the "reference" benchmark (i.e. where your branch forked off main)? @mdboom
Anyway, we can talk tomorrow about implications of this negative result.
Is there an option to force re-running the "reference" benchmark (i.e. where your branch forked off main)? @mdboom
It doesn't ever re-run a base commit if it already has that commit hash. However, if you specify the base's commit hash directly in its own run, it will overwrite the old results. (I'm not saying that's good UI design, but it would work with what we have today).
To isolate the cost of copying the consts to the frame locals array, I ran the benchmarks over a branch where we do that, but LOAD_CONST and KW_NAMES still take the value from co_consts as before. This showed a 2% slowdown.
To check if this is benchmarking noise, I am now running the benchmarks on a branch in my repo with no changes compared to main.
I think I've been looking at this in the wrong way.
Copying the consts in bulk at the beginning of the function is supposed to replace all the LOAD_CONST instructions. The performance gain comes from (1) copying each const once, no matter how many times it is accessed and (2) copying them in bulk.
So while it is perhaps not realistic to think that copying the consts will somehow be free, it is cheaper than all the individual LOAD_CONSTs, and we will see a gain once we remove them.
The idea of a bytecode that lazily initialises the consts before the first const is loaded is not easy to get right/efficient. Suppose that the first const access is in a loop. Maybe the loop won't execute at all, so we don't want an INITIALIZE_CONSTANTS before the loop. But we also don't want to execute INITIALIZE_CONSTANTS inside the loop.
Couldn’t it have a flag on the frame that it’s done it already?
But agreed we should put that off until we need it.
Couldn’t it have a flag on the frame that it’s done it already?
Sure, it won't memcpy again but you still pay for dispatch and checking the flag.
Yup, so better lift it out of the loop, even if sometimes it doesn’t iterate.
I created this branch which:
The benchmark results came back with a 2% slowdown. Did I run them correctly? Can we understand/improve that result?