accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
306 stars 117 forks source link

Tracing Tool Hangs #333

Closed syifan closed 2 months ago

syifan commented 2 months ago

I am trying to run the tracing tool, but the program hangs.

I am testing on an NVIDIA A100 PCIe 80GB. I am using this command LD_PRELOAD=./tracer_tool/tracer_tool.so ./nvbit_release/test-apps/vectoradd/vectoradd as directed in the instructions.

Using GDB, I can see that the program hangs here https://github.com/accel-sim/accel-sim-framework/blob/release/util/tracer_nvbit/tracer_tool/tracer_tool.cu#L673. I cannot trace further since it seems the program is not getting into the channel_host.init method, but got stuck in _cudaInitModule

Anyone is facing the same issue? Any suggestions on how to solve the problem?

JRPan commented 2 months ago

Are you using release or dev branch? Can you try dev?

syifan commented 2 months ago

I am using release. Let me try dev.

tgrogers commented 2 months ago

Junrui, can you see if you can reproduce this on release? We may need a push to release.

On Fri, Sep 6, 2024 at 2:36 PM Yifan Sun @.***> wrote:

I am using release. Let me try dev.

— Reply to this email directly, view it on GitHub https://github.com/accel-sim/accel-sim-framework/issues/333#issuecomment-2334614957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7UY4IGDIWQ6EGMJOVNYGTZVHYZBAVCNFSM6AAAAABNYXS63WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZUGYYTIOJVG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

syifan commented 2 months ago

We got the same behavior for the dev branch. Please see the screenshot below. We added a print Work 1 just to help us identify where it stops. Should we modify any parameters?

Screenshot 2024-09-06 at 2 46 08 PM

JRPan commented 2 months ago

The nvbit version is still 1.5.5 latest dev uses 1.7

Can you please try delete nvbit_release folder, make clean, and try again with dev?

syifan commented 2 months ago

OK. We have upgraded to the most recent version, but the result is still the same. Screenshot 2024-09-06 at 3 05 30 PM

JRPan commented 2 months ago

we tried on V100 and A30, CUDA 11.7 and CUDA 12.2. Both run fine.

pan251@tgrogers-gpu01:tracer_nvbit$ LD_PRELOAD=./tracer_tool/tracer_tool.so ./nvbit_release/test-apps/vectoradd/vectoradd
------------- NVBit (NVidia Binary Instrumentation Tool v1.7) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
ACK_CTX_INIT_LIMITATION = 0 - if set, no warning will be printed for nvbit_at_ctx_init()
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
       NO_EAGER_LOAD = 0 - eager module loading is turned on by NVBit to prevent potential NVBit tool deadlock, turn it off if you want to use the lazy module loading feature
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
      TRACE_LINEINFO = 0 - Include source code line info at the start of each traced line. The target binary must be compiled with -lineinfo or --generate-line-info
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
 TRACE_FILE_COMPRESS = 0 - Create xz-compressed tracefile
----------------------------------------------------------------------------------------------------
WARNING: Do not call CUDA memory allocation in nvbit_at_ctx_init(). It will cause deadlocks. Do them in nvbit_tool_init(). If you encounter deadlocks, remove CUDA API calls to debug.
Writing results to /home/tgrogers-raid/a/pan251/accel-sim-framework-dev/util/tracer_nvbit/traces//kernel-1.trace
Final sum = 100000.000000; sum/n = 1.000000 (should be ~1)

I'm not sure about the cause right now. What's your OS version, CUDA, and driver version? I'll try to reproduce it.

syifan commented 2 months ago

OK. The problem is now solved. The secret is to use the dev branch. But by switching to the dev branch, we need to believe the nvbit_release directory, run make clean, and make again.

Thanks for your help!