baverman / flameprof

Flamegraph generator for cProfile stats
MIT License
237 stars 15 forks source link

flameprof sometimes records impossible stacks #11

Open nickodell opened 1 year ago

nickodell commented 1 year ago

Hi, I'm attempting to benchmark a project, and I'm getting a really weird result. There are call paths in the flame graph that cannot really be part of the program. Here's a minimal, self-contained example of my problem:

import cProfile
import subprocess

def a():
    s = 0
    for i in range(1000):
        s += i
    return s

def b():
    s = 0
    for i in range(1000):
        s += i ** 2
    return s

def dispatch(func):
    return func()

def a_caller():
    return dispatch(a)

def b_caller():
    return dispatch(b)

def to_profile():
    return a_caller() + b_caller()

stats_file = 'test.prof'
flame_file = 'test.svg'
cProfile.run('to_profile()', stats_file)
subprocess.check_call(f'flameprof {stats_file} > {flame_file}', shell=True)

In this program, the function a_caller() calls a(), and the function b_caller() calls b(). There is no path for a_caller() to call b(), although the function dispatch() will appear on both stacks.

This is what the profile looks like:

test

The profile reports that a_caller() spends most of its time inside of b(). Removing dispatch() and calling a() or b() directly provides the correct result.

This is a problem for me because any numpy operation I do goes through a single numpy function, <built-in method numpy.core._multiarray_umath.implement_array_function>. For example, in this part of the trace, I'm calling np.repeat(). As far as I know, that shouldn't invoke cumsum(), hstack(), or delete().

Screenshot from 2022-12-20 14-05-53

Any ideas on how to improve this, or things I might be doing wrong? Is this just an intrinsic limitation of cProfile?