Open JeffersGlass opened 5 months ago
I'm not sure it's relevant to CPython's JIT (@brandtbucher should probably weigh in there), but it seems like this code is intended to do more repetitions in the same process to increase the likelihood of code warming up. I think as an experiment, it's probably worth turning this on for a JIT build and seeing what happens to the numbers.
My broader concern would be whether this introduces more uncontrolled variables between JIT and non-JIT runs. A big part of what we want to answer is "is CPython faster with the JIT enabled than without" and if the code being run is different, I worry that would muddy the answer (even if it was mathematically compensated for).
My broader concern would be whether this introduces more uncontrolled variables between JIT and non-JIT runs. A big part of what we want to answer is "is CPython faster with the JIT enabled than without" and if the code being run is different, I worry that would muddy the answer (even if it was mathematically compensated for).
Yeah, let's not do this (at least not until the JIT is on in most builds and we can do this for every "CPython" run).
I've never been a huge fan of the tendency to let JITs "warm up" before running benchmarks, since it's comparing one implementation's peak performance against another's "average" performance. Pyperf already does a bit of warmup for us anyways to populate caches and such, so I'm not sure we have much to gain by just increasing how much warmup we're allowing ourselves when measuring these things.
I might be interested in just seeing if there's a perf difference running CPython under both modes, with the JIT. We work pretty hard to avoid an expensive warmup period, so it could be validating to see that they're both similar.
IMO, "warmup" periods are a kind of cheating; a way for heavyweight JITs, like Graal or LLVM based compilers, to claim better performance than they really have. So no "warmup"s within a benchmark run.
A single iteration of the whole benchmark as a warmup makes sense as it warms up though.
We need to compile .pyc
files and warmup OS disk caches, which are things that we don't want to measure as a metric.
Poking around in
pyperf
, I see that it has some hardcoded options for whether a particular implementation has a JIT or not:_utils.py:192-200
The upshot is that implementations with a JIT are run with fewer total processes, but with more values extracted per process:
_runner.py: 100-114
I imagine this is mostly only relevant if one desires to compare across implementations, but I am curious what the effect of running with fewer processes/more values would be on measured JIT performance versus base CPython. Or if this is even relevant to CPython's JIT.