accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
290 stars 110 forks source link

Error when generate traces #328

Closed DestroyGPU closed 1 week ago

DestroyGPU commented 2 weeks ago
root@938bf3d6581e:/workspace/accel-sim-framework# ./util/tracer_nvbit/run_hw_trace.py -B rodinia_2.0-ft -D 1                        
Running backprop-rodinia-2.0-ft
------------- NVBit (NVidia Binary Instrumentation Tool v1.7) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
ACK_CTX_INIT_LIMITATION = 0 - if set, no warning will be printed for nvbit_at_ctx_init()
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
       NO_EAGER_LOAD = 0 - eager module loading is turned on by NVBit to prevent potential NVBit tool deadlock, turn it off if you want to use the lazy module loading feature
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
      TRACE_LINEINFO = 0 - Include source code line info at the start of each traced line. The target binary must be compiled with -lineinfo or --generate-line-info
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
 TRACE_FILE_COMPRESS = 0 - Create xz-compressed tracefile
----------------------------------------------------------------------------------------------------
Random number generator seed: 7
Input layer size : 4096
Starting training kernel
WARNING: Do not call CUDA memory allocation in nvbit_at_ctx_init(). It will cause deadlocks. Do them in nvbit_tool_init(). If you encounter deadlocks, remove CUDA API calls to debug.
Performing GPU computation
ASSERT FAIL: function.cpp:739:void Function::gen_new_code(std::unordered_map<std::__cxx11::basic_string<char>, Function*>&): FAIL !(nregs <= 24) MSG: instrumentation function should not use more than 24 registers!
Error invoking nvbit on /workspace/accel-sim-framework/hw_run/traces/device-1/11.8/backprop-rodinia-2.0-ft/4096___data_result_4096_txt

I followed the instructions on readme, and got the above error when use nvbit to generate traces. Could you please help me with this? Thanks!

JRPan commented 2 weeks ago

What card is this?

DestroyGPU commented 2 weeks ago

A100 80G

DestroyGPU commented 2 weeks ago

Any solutions for this?

JRPan commented 2 weeks ago

I would assume this is because of some changed in CUDA 12+?

Can you try CUDA 11.x?

DestroyGPU commented 2 weeks ago

I am using CUDA 11.8

JRPan commented 2 weeks ago

@barnes88 Could you please take a look

DestroyGPU commented 1 week ago

Figured it out. Close the issue.

JRPan commented 1 week ago

Glad to hear that.

Would you mind sharing the solution? Maybe helpful for other people in the future!

Thanks

DestroyGPU commented 1 week ago

This is caused by bc. Solve this issue by reinstall it.