faster-cpython / ideas

1.69k stars 49 forks source link

Should `pyperf` detect when CPython is using a JIT? #683

Open JeffersGlass opened 5 months ago

JeffersGlass commented 5 months ago

Poking around in pyperf, I see that it has some hardcoded options for whether a particular implementation has a JIT or not:

_utils.py:192-200

def python_has_jit():
    implementation_name = python_implementation()
    if implementation_name == 'pypy':
        return sys.pypy_translation_info["translation.jit"]
    elif implementation_name in ['graalpython', 'graalpy']:
        return True
    elif hasattr(sys, "pyston_version_info") or "pyston_lite" in sys.modules:
        return True
    return False

The upshot is that implementations with a JIT are run with fewer total processes, but with more values extracted per process:

_runner.py: 100-114

has_jit = pyperf.python_has_jit()
if not values:
    if has_jit:
        # Since PyPy JIT has less processes:
        # run more values per process
        values = 10
    else:
        values = 3
if not processes:
    if has_jit:
        # Use less processes than non-JIT, because JIT requires more
        # warmups and so each worker is slower
        processes = 6
    else:
        processes = 20

I imagine this is mostly only relevant if one desires to compare across implementations, but I am curious what the effect of running with fewer processes/more values would be on measured JIT performance versus base CPython. Or if this is even relevant to CPython's JIT.

mdboom commented 5 months ago

I'm not sure it's relevant to CPython's JIT (@brandtbucher should probably weigh in there), but it seems like this code is intended to do more repetitions in the same process to increase the likelihood of code warming up. I think as an experiment, it's probably worth turning this on for a JIT build and seeing what happens to the numbers.

My broader concern would be whether this introduces more uncontrolled variables between JIT and non-JIT runs. A big part of what we want to answer is "is CPython faster with the JIT enabled than without" and if the code being run is different, I worry that would muddy the answer (even if it was mathematically compensated for).

brandtbucher commented 5 months ago

My broader concern would be whether this introduces more uncontrolled variables between JIT and non-JIT runs. A big part of what we want to answer is "is CPython faster with the JIT enabled than without" and if the code being run is different, I worry that would muddy the answer (even if it was mathematically compensated for).

Yeah, let's not do this (at least not until the JIT is on in most builds and we can do this for every "CPython" run).

I've never been a huge fan of the tendency to let JITs "warm up" before running benchmarks, since it's comparing one implementation's peak performance against another's "average" performance. Pyperf already does a bit of warmup for us anyways to populate caches and such, so I'm not sure we have much to gain by just increasing how much warmup we're allowing ourselves when measuring these things.

brandtbucher commented 5 months ago

I might be interested in just seeing if there's a perf difference running CPython under both modes, with the JIT. We work pretty hard to avoid an expensive warmup period, so it could be validating to see that they're both similar.

markshannon commented 5 months ago

IMO, "warmup" periods are a kind of cheating; a way for heavyweight JITs, like Graal or LLVM based compilers, to claim better performance than they really have. So no "warmup"s within a benchmark run.

A single iteration of the whole benchmark as a warmup makes sense as it warms up though. We need to compile .pyc files and warmup OS disk caches, which are things that we don't want to measure as a metric.