accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
303 stars 118 forks source link

Cutlass 3.0 segmentation fault #332

Closed DestroyGPU closed 1 month ago

DestroyGPU commented 2 months ago
root@625d74bba685:/workspace/accel-sim-framework# ./util/tracer_nvbit/run_hw_trace.py -B cutlass -D 0
Running cutlass_perf_test_k1
------------- NVBit (NVidia Binary Instrumentation Tool v1.7) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
ACK_CTX_INIT_LIMITATION = 0 - if set, no warning will be printed for nvbit_at_ctx_init()
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
       NO_EAGER_LOAD = 0 - eager module loading is turned on by NVBit to prevent potential NVBit tool deadlock, turn it off if you want to use the lazy module loading feature
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
      TRACE_LINEINFO = 0 - Include source code line info at the start of each traced line. The target binary must be compiled with -lineinfo or --generate-line-info
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
 TRACE_FILE_COMPRESS = 0 - Create xz-compressed tracefile
----------------------------------------------------------------------------------------------------
cudaGetExportTable: UUID = 0x6e 0x16 0x3f 0xbe 0xb9 0x58 0x44 0x4d 0x83 0x5c 0xe1 0x82 0xaf 0xf1 0x99 0x1e 
cudaGetExportTable: UUID = 0x35 0x77 0xf 0x1b 0x9 0x2e 0x3 0x48 0xa4 0x8e 0x5 0x6f 0xc4 0x23 0x96 0x8d 
cudaGetExportTable: UUID = 0xbf 0xdb 0x43 0x2d 0xbf 0x3c 0x5a 0x4a 0x94 0x5e 0xb3 0x40 0x29 0xe8 0x1e 0x75 
cudaGetExportTable: UUID = 0x21 0x31 0x8c 0x60 0x97 0x14 0x32 0x48 0x8c 0xa6 0x41 0xff 0x73 0x24 0xc8 0xf2 
cudaGetExportTable: UUID = 0x42 0xd8 0x5a 0x81 0x23 0xf6 0xcb 0x47 0x82 0x98 0xf6 0xe7 0x8a 0x3a 0xec 0xdc 
cudaGetExportTable: UUID = 0xb1 0x5 0x41 0xe1 0xf7 0xc7 0xc7 0x4a 0x9f 0x64 0xf2 0x23 0xbe 0x99 0xf1 0xe2 
cudaGetExportTable: UUID = 0x6b 0xd5 0xfb 0x6c 0x5b 0xf4 0xe7 0x4a 0x89 0x87 0xd9 0x39 0x12 0xfd 0x9d 0xf9 
cudaGetExportTable: UUID = 0xa6 0xb1 0xff 0x99 0xec 0xc4 0xc9 0x4f 0x92 0xf9 0x19 0x28 0x66 0x3d 0x55 0x85 
cudaGetExportTable: UUID = 0xf8 0x8c 0xc9 0x3e 0x53 0xfd 0x9e 0x46 0xba 0x59 0x1e 0x2b 0x87 0x3e 0xf 0x91 
run.sh: line 3: 66146 Segmentation fault      (core dumped) CUDA_INJECTION64_PATH=/workspace/accel-sim-framework/util/tracer_nvbit/tracer_tool/tracer_tool.so LD_PRELOAD=/workspace/accel-sim-framework/util/tracer_nvbit/tracer_tool/tracer_tool.so /workspace/accel-sim-framework/gpu-app-collection/src/..//bin/11.8/release/cutlass_perf_test_k1 --seed=2020 --dist=0 --m=2560 --n=16 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
Error invoking nvbit on /workspace/accel-sim-framework/hw_run/traces/device-0/11.8/cutlass_perf_test_k1/__seed_2020___dist_0____m_2560___n_16___k_2560___kernels_sgemm____iterations_5___providers_cutlass

Got the above error after running ./util/tracer_nvbit/run_hw_trace.py -B cutlass -D 0. CUDA version 11.8 on A100.

Could you please take a look at this?

Thanks in advance!

cesar-avalos3 commented 2 months ago

This usually happens if you run it with GPGPUsim instead of in-hardware. Could you check if GPGPUsim env variable is set?

DestroyGPU commented 2 months ago

Yes, I unset the environment variable, and now it can run! Thank you! Writing results to /workspace/accel-sim-framework/hw_run/traces/device-0/11.8/cutlass_perf_test_k1/__seed_2020___dist_0____m_2560___n_16___k_2560___kernels_sgemm____iterations_5___providers_cutlass/traces//kernel-1.trace By the way, how long will it take to finish generating the trace for cutlass? It has been running for several minutes and is still running. Now it generates one trace, and the trace file is 1.76GB and is not growing larger. I suspect it is stuck somewhere. And how long will it take to run the generated trace on gpu simulator?

cesar-avalos3 commented 2 months ago

Just judging by the size, a 1.76 GB trace should be on the quicker side, I've not run cutlass lately though. I've gotten 1/2 TB traces, with weeks/(months) long simulation time.

DestroyGPU commented 1 month ago

Thanks!

AnthonyMichaelTDM commented 1 month ago

This usually happens if you run it with GPGPUsim instead of in-hardware. Could you check if GPGPUsim env variable is set?

Would it make sense for the tracer to detect if it's running with GPGPUsim during nvbit_at_init and exit early with an error message if it is?

AnthonyMichaelTDM commented 1 month ago

Something like:

...
void nvbit_at_init() {
  // detect if the user has sourced the gpgpu-sim environment, if they have then
  // we should exit early with an error message because the tool will not work
  // with gpgpu-sim
  if (std::getenv("GPGPUSIM_SETUP_ENVIRONMENT_WAS_RUN")) {
    std::cerr << "Error: gpgpu-sim environment detected, this tool is not "
                 "compatible with gpgpu-sim"
              << std::endl;
    exit(1);
  }

  ...