data-apis / python-record-api

Inferring Python API signatures from tracing usage.
MIT License
75 stars 6 forks source link

Add profiling for C calls #5

Open saulshanabrook opened 4 years ago

saulshanabrook commented 4 years ago

Currently, if a library calls another library through their C API we are unable to trace it. This includes calling anything in Cython. This is too bad, because a lot of calls to NumPy are from Cython or C libraries.

One idea on how to achieve this, from talking to @scopatz, was to use lldb's Python API. It is now building on Conda Forge on mac so I can get started exploring this.

amueller commented 4 years ago

FWIW the most calls to numpy in sklearn are in Python, I think Cython might call more directly to BLAS or we're implementing our own logic.

saulshanabrook commented 4 years ago

Looking through the skimage codebase, I saw a bunch of that is basically just calling out to the normal NumPy API but in Cython, which we totally miss, like this: https://github.com/scikit-image/scikit-image/blob/f71be82423e73cda4f3026a0eb656614db937bbc/skimage/feature/_cascade.pyx#L581-L598

mattip commented 4 years ago

PEP 578 provides C- and Python- level hooks for this kind of thing. Maybe there could be an opt-in Cython mode for this?

saulshanabrook commented 4 years ago

Maybe there could be an opt-in Cython mode for this?

That would help definitely... Would require upstream change to Cython right?

Another thought would be to have Cython build in such a way that it doesn't actually unroll the Python interpreter... For debugging purposes? Not sure how hard this would be.

mattip commented 4 years ago

Cython build in such a way that it doesn't actually unroll the Python interpreter.

I think that would have a severe performance hit, but it is worth exploring these ideas with them.

saulshanabrook commented 4 years ago

I think that would have a severe performance hit, but it is worth exploring these ideas with them.

Cool, well that would be nice to explore down the road then. This whole thing is super severe performance hit already! So I wouldn't worry about that for our use case, although of course you would only want to build in this mode for debugging or tracing like this.

jack-pappas commented 3 years ago

What about gathering the data using something like bpftrace / bcc? The PEP 578 audit hook @mattip mentioned is included in the static probes / tracepoints compiled into CPython (search for PyDTrace_AUDIT), so you should be able to get at it with bpftrace's usdt probe on Linux (or DTrace, if you're on a Mac).

The ustack function can be used to get all the user-mode C calls within a process; I think you'd then filter down to look for stacks containing calls to the numpy C API. uprobe / uretprobe probes can instrument specific functions so you can e.g. print out arguments and return values to numpy C API functions.

Additional references: