P403n1x87 / austin

Python frame stack sampler for CPython
https://pypi.org/project/austin-dist/
GNU General Public License v3.0
1.89k stars 54 forks source link

austinp hanging on profiling numpy/scipy #228

Closed rachtsingh closed 2 weeks ago

rachtsingh commented 3 months ago

Description

I'm unable to profile scipy.sparse.csr_matrix because it seems to hang on a subprocess call to check for SVE support (calls lscpu).

Steps to Reproduce

Here's a minimal reproducer:

$ cat minimal_test.py
from scipy.sparse import csr_matrix

and I ran this austinp command:

austinp -C -i 10000 -o test.prof python minimal_test.py

Expected behavior: It should return pretty much immediately.

Actual behavior: It hangs, and when I cancel it:

Parent process

🐍 Python version: 3.12.4

Child processes
^C
Statistics
⌛ Sampling duration : 49.68 s
⏱️  Frame sampling (min/avg/max) : 28892/35956/77817 μs
🐢 Long sampling rate : 706/706 (100.00 %) samples took longer than the sampling interval to collect
💀 Error rate : 0/706 (0.00 %) invalid samples
Traceback (most recent call last):
  File "/home/ubuntu/minimal_test.py", line 1, in <module>
    from scipy.sparse import csr_matrix
  File "/opt/pyenv/versions/prod/lib/python3.12/site-packages/scipy/sparse/__init__.py", line 294, in <module>
    from ._base import *
  File "/opt/pyenv/versions/prod/lib/python3.12/site-packages/scipy/sparse/_base.py", line 5, in <module>
    from scipy._lib._util import VisibleDeprecationWarning
  File "/opt/pyenv/versions/prod/lib/python3.12/site-packages/scipy/_lib/_util.py", line 18, in <module>
    from scipy._lib._array_api import array_namespace
  File "/opt/pyenv/versions/prod/lib/python3.12/site-packages/scipy/_lib/_array_api.py", line 15, in <module>
    from numpy.testing import assert_
  File "/opt/pyenv/versions/prod/lib/python3.12/site-packages/numpy/testing/__init__.py", line 11, in <module>
    from ._private.utils import *
  File "/opt/pyenv/versions/prod/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 1253, in <module>
    _SUPPORTS_SVE = check_support_sve()
                    ^^^^^^^^^^^^^^^^^^^
  File "/opt/pyenv/versions/prod/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 1247, in check_support_sve
    output = subprocess.run(cmd, capture_output=True, text=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pyenv/versions/3.12.4/lib/python3.12/subprocess.py", line 550, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pyenv/versions/3.12.4/lib/python3.12/subprocess.py", line 1209, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pyenv/versions/3.12.4/lib/python3.12/subprocess.py", line 2115, in _communicate
    ready = selector.select(timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pyenv/versions/3.12.4/lib/python3.12/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

Reproduces how often: 100% of the time.

Versions

$ austinp --version
austinp 3.6.0
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
P403n1x87 commented 1 month ago

@rachtsingh thanks for reporting this. Would you be able to determine where Austin gets stuck at by any chance? You might need to compile from sources with debug symbols to get something useful. If you could do that we might be able to narrow this down a bit. In general austinp is not super-stable, and that seems to be inherently due to the way libunwind is used.

P403n1x87 commented 2 weeks ago

I've been testing with the upcoming 3.7 release and I cannot reproduce the behavior described in the issue. Running the script I get the following wall time flame graph

Screenshot 2024-10-14 at 16 22 42

The system is Ubuntu 20.04 with Python 3.12.7 running with LimaVM on MacBook Pro Intel.

Note that many frames show up as <unknown>. These are quite likely frames associated with the native libraries that are lacking debug symbol information.

rachtsingh commented 2 weeks ago

Hey, sorry for the slow response - after having upgraded our production systems a little bit between then and now, I don't seem to see this issue anymore. I didn't update austinp (still 3.6.0), so my guess is either (a) it was some system instability that we solved a little bit ago, or (b) something that's numpy's fault.

Going to close, and I appreciate you looking into this - sorry for the runaround.

P403n1x87 commented 2 weeks ago

Many thanks for the update @rachtsingh, much appreciated! Should this still be an issue please feel free to re-open.