generate trace error with a cuDNN example

wangyuyue commented 1 year ago

Dear developers, I run into the error below use the command: glacier@node0:~/work/accel-sim-framework/util/tracer_nvbit$ LD_PRELOAD=./tracer_tool/tracer_tool.so ~/work/cudnn-samples/src/cudnn_samples_v8/conv_sample/conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1

------------- NVBit (NVidia Binary Instrumentation Tool v1.5.3) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
----------------------------------------------------------------------------------------------------
Executing: conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
conv_sample: arch/gm10x_hal.cpp:181: void set_imm_relative_control_flow(uint64_t*, int64_t): Assertion `!IS_LARGER_THAN_24BIT(imm)' failed.
Aborted

And this is the result without instruments:

glacier@node0:~/work/accel-sim-framework/util/tracer_nvbit$ ~/work/cudnn-samples/src/cudnn_samples_v8/conv_sample/conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Executing: conv_sample -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1
Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C)
Testing single precision
====USER DIMENSIONS====
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.507221 sec,  
Test PASSED
Testing half precision (math in single precision)
====USER DIMENSIONS====
input dims are 1, 2048, 7, 7
filter dims are 512, 2048, 1, 1
output dims are 1, 512, 7, 7
====PADDING DIMENSIONS====
padded input dims are 1, 2048, 7, 7
padded filter dims are 512, 2048, 1, 1
padded output dims are 1, 512, 7, 7
Testing conv
^^^^ CUDA : elapsed = 0.000282049 sec,  
Test PASSED

wangyuyue commented 1 year ago

And same error occurs for DeepBench Nvidia conv_bench-tencore glacier@node0:~/work/accel-sim-framework/util/tracer_nvbit$ LD_PRELOAD=./tracer_tool/tracer_tool.so ../../conv_bench-tencore

------------- NVBit (NVidia Binary Instrumentation Tool v1.5.3) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
----------------------------------------------------------------------------------------------------
                  Running training benchmark 
                         Times
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   w      h      c      n      k      f_w    f_h  pad_w  pad_h    stride_w  stride_h    precision  fwd_time (usec)  bwd_inputs_time (usec)  bwd_params_time (usec)  total_time (usec) pad_kerenels     fwd_algo 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
conv_bench-tencore: arch/gm10x_hal.cpp:181: void set_imm_relative_control_flow(uint64_t*, int64_t): Assertion `!IS_LARGER_THAN_24BIT(imm)' failed.
Aborte

wangyuyue commented 1 year ago

BTW, I find an issue in the NVBit repo. It seems that this project needs to sync the NVBit to the newest version.

JRPan commented 1 year ago

The dev branch has the latest NVBit. Or you can manually upgrade it by modifying install_nvbit.sh. Does upgrading to the latest nvbit solve your problem?

wangyuyue commented 1 year ago

The dev branch has the latest NVBit. Or you can manually upgrade it by modifying install_nvbit.sh. Does upgrading to the latest nvbit solve your problem?

Yes, it solves it. Please consider updating the install_nvbit script and closing this issue. Thanks for your attention.

JRPan commented 1 year ago

Nice. And We'll merge it in our next release.

accel-sim / accel-sim-framework

generate trace error with a cuDNN example #170