On PyPy (and maybe future CPython), you probably want to make sure the function is JITed

itamarst commented 2 months ago

PyPy uses a JIT, so the initial run of a function may be misleading: after some number of runs the JIT kicks in, and then it will run much faster. CPython is working on a JIT too, so this may be relevant to future CPython releases as well (possibly even 3.13).

The code would likely look like:

import platform   # NEW

IS_JIT = platform.python_implementation().lower() == "pypy" # NEW

def _run_with_instrumentation(
    lib: LibType,
    nodeId: str,
    config: pytest.Config,
    fn: Callable[..., Any],
    *args,
    **kwargs,
):
    is_gc_enabled = gc.isenabled()
    if is_gc_enabled:
        gc.collect()
        gc.disable()

    result = None

    def __codspeed_root_frame__():
        nonlocal result
        result = fn(*args, **kwargs)

    if SUPPORTS_PERF_TRAMPOLINE:
        # Warmup CPython performance map cache
        __codspeed_root_frame__()

    if IS_JIT:                          # NEW
        # Warm up JIT
        for _ in range(100):          # NEW
            __codspeed_root_frame__()  # NEW

    lib.zero_stats()
    lib.start_instrumentation()
    __codspeed_root_frame__()
    lib.stop_instrumentation()
    uri = get_git_relative_uri(nodeId, config.rootpath)
    lib.dump_stats_at(uri.encode("ascii"))
    if is_gc_enabled:
        gc.enable()

    return result

itamarst commented 2 months ago

The worry is that the JIT compilation will ... "over-optimize" the code. See e.g. this explanation of why this Rust benchmark framework requires the use of the black_box() function: https://bheisler.github.io/criterion.rs/book/getting_started.html#details

I will do some investigation and report back.

itamarst commented 2 months ago

This is definitely going to be a problem (and likely for pytest-benchmark as well). There's the equivalent of block_box in PyPy (pypyjit.residual_call()) but ... it may also require intervention by benchmark authors.

itamarst commented 2 months ago

Further discussion with PyPy authors suggests that over-optimization of code (where you end up benchmarking code that won't match real world code) is not really solvable by the framework. It will likely require changes to the framework to make it easier for benchmark authors, documentation change, and benchmarks to be written in a particular way.

If CPython gets a JIT, this will likely become an even bigger issue.

I am therefore going to do some more research and then make suggestions that will presumably require pytest-benchmark changes too. Will report back.

art049 commented 2 months ago

Thanks a lot for the investigation. I totally agree with your Point of View, and it's already quite painful in compiled languages. When that happens, we will measure both the cold and JIT-optimized functions since the data will make sense to compare.

Definitely, this optimization topic will be a big problem. One way to "solve" it would be to ignore and disable the JIT, but this won't match real-world data.

You also mentioned warmup to make the JIT optimize the function; this might be hard. We ran some experiments with V8 but didn't find a definitive way to ensure the optimization of the whole execution path. Is there anything to force optimization with Pypy? I'm not aware if such a thing is also planned with CPython.

itamarst commented 2 months ago

With PyPy you run the function ~3000 times and that triggers JIT compilation.

CodSpeedHQ / pytest-codspeed

On PyPy (and maybe future CPython), you probably want to make sure the function is JITed #33