accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
308 stars 118 forks source link

generate trace error with a cuDNN example #170

Closed wangyuyue closed 1 year ago

wangyuyue commented 1 year ago

Dear developers, I run into the error below use the command: glacier@node0:~/work/accel-sim-framework/util/tracer_nvbit$ LD_PRELOAD=./tracer_tool/ ~/work/cudnn-samples/src/cudnn_samples_v8/conv_sample/conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1

------------- NVBit (NVidia Binary Instrumentation Tool v1.5.3) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
Executing: conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
conv_sample: arch/gm10x_hal.cpp:181: void set_imm_relative_control_flow(uint64_t*, int64_t): Assertion `!IS_LARGER_THAN_24BIT(imm)' failed.

And this is the result without instruments:

glacier@node0:~/work/accel-sim-framework/util/tracer_nvbit$ ~/work/cudnn-samples/src/cudnn_samples_v8/conv_sample/conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Executing: conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.507221 sec,  
Testing half precision (math in single precision)
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.000282049 sec,  
wangyuyue commented 1 year ago

And same error occurs for DeepBench Nvidia conv_bench-tencore glacier@node0:~/work/accel-sim-framework/util/tracer_nvbit$ LD_PRELOAD=./tracer_tool/ ../../conv_bench-tencore

------------- NVBit (NVidia Binary Instrumentation Tool v1.5.3) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
                  Running training benchmark 
   w      h      c      n      k      f_w    f_h  pad_w  pad_h    stride_w  stride_h    precision  fwd_time (usec)  bwd_inputs_time (usec)  bwd_params_time (usec)  total_time (usec) pad_kerenels     fwd_algo 
conv_bench-tencore: arch/gm10x_hal.cpp:181: void set_imm_relative_control_flow(uint64_t*, int64_t): Assertion `!IS_LARGER_THAN_24BIT(imm)' failed.
wangyuyue commented 1 year ago

BTW, I find an issue in the NVBit repo. It seems that this project needs to sync the NVBit to the newest version.

JRPan commented 1 year ago

The dev branch has the latest NVBit. Or you can manually upgrade it by modifying Does upgrading to the latest nvbit solve your problem?

wangyuyue commented 1 year ago

The dev branch has the latest NVBit. Or you can manually upgrade it by modifying Does upgrading to the latest nvbit solve your problem?

Yes, it solves it. Please consider updating the install_nvbit script and closing this issue. Thanks for your attention.

JRPan commented 1 year ago

Nice. And We'll merge it in our next release.