gaogaotiantian / viztracer

VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.
https://viztracer.readthedocs.io/
Apache License 2.0
4.99k stars 374 forks source link

can not trace log_func_args when import torch #338

Open yihong0618 opened 1 year ago

yihong0618 commented 1 year ago

os: ubuntu20.04 and MacOS m1 both test

cat foo.py

import torch

def bar():
    return 3

bar()

run this command the viztracer will hang

viztracer --log_func_args   a.py

use py-spy to check the stack will see as follows:

image

seems loop in getattr (torch/_ops.py:480)

....
....
....
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __getattr__ (torch/_ops.py:480)
    __repr__ (torch/_ops.py:434)
    __init__ (torch/_ops.py:422)
    __getattr__ (torch/_ops.py:581)
    <module> (torch/fx/node.py:35)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/fx/graph.py:2)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/fx/graph_module.py:8)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/fx/__init__.py:83)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/ao/quantization/utils.py:12)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/ao/quantization/observer.py:15)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/ao/quantization/fake_quantize.py:8)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/ao/quantization/__init__.py:3)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:992)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:992)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/quantization/quantize.py:10)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (torch/quantization/__init__.py:1)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    _handle_fromlist (<frozen importlib._bootstrap>:1078)
    <module> (torch/__init__.py:1288)
    _call_with_frames_removed (<frozen importlib._bootstrap>:241)
    exec_module (<frozen importlib._bootstrap_external>:883)
    _load_unlocked (<frozen importlib._bootstrap>:688)
    _find_and_load_unlocked (<frozen importlib._bootstrap>:1006)
    _find_and_load (<frozen importlib._bootstrap>:1027)
    <module> (a.py:2)
    run_code (viztracer/main.py:334)
    run_command (viztracer/main.py:417)
    run (viztracer/main.py:301)
    main (viztracer/main.py:566)
    <module> (viztracer:8)
yihong0618 commented 1 year ago

after some search the root cause seems from PyObject_Repr(PyDict_GetItem(locals, name)); the repr and getattar caused the recursive. but I am not sure it needs fix on viztracer or pytorch side.

@gaogaotiantian

gaogaotiantian commented 1 year ago

It seems like using getattr on that object will cause infinite recursion? If this could be reproduced without viztracer, maybe it’s not a viztracer issue. Is there a note in torch for this? That users should not do it?

yihong0618 commented 1 year ago

not exactly I did some search and no relate issues on pytorch. but in cpython I found this one https://github.com/python/cpython/issues/86075 similar but not the same thing. I will try use bare sys.settrace to test later