accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
273 stars 105 forks source link

Non-matching CPU-GPU output for atax and gesummv in polybench #264

Open zqj2333 opened 7 months ago

zqj2333 commented 7 months ago

Hello,

When I use the tracer in dev branch to generate traces for polybench-atax, there is some non-matching output, but when I directly run it on the same GPU, there is no non-matching output.

The log of tracing:

------------- NVBit (NVidia Binary Instrumentation Tool v1.5.5) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
    EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count
      TRACE_LINEINFO = 0 - Include source code line info at the start of each traced line. The target binary must be compiled with -lineinfo or --generate-line-info
DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit
DYNAMIC_KERNEL_LIMIT_START = 0 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel
   ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
       TOOL_COMPRESS = 1 - Enable traces compression
     TOOL_TRACE_CORE = 0 - write the core id in the traces
TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END
USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment
----------------------------------------------------------------------------------------------------
setting device 0 with name NVIDIA RTX A6000
Writing results to /gpu_perf_model/accel-sim-framework/hw_run/traces/device-0/11.2/polybench-atax/NO_ARGS/traces//kernel-1.trace
Writing results to /gpu_perf_model/accel-sim-framework/hw_run/traces/device-0/11.2/polybench-atax/NO_ARGS/traces//kernel-2.trace
GPU Runtime: 5.762122s
CPU Runtime: 0.092334s
Non-Matching CPU-GPU Outputs Beyond Error Threshold of 0.50 Percent: 4095
Processing file /gpu_perf_model/accel-sim-framework/hw_run/traces/device-0/11.2/polybench-atax/NO_ARGS/traces/kernel-1.trace
Processing file /gpu_perf_model/accel-sim-framework/hw_run/traces/device-0/11.2/polybench-atax/NO_ARGS/traces/kernel-2.trace

The log of direct run:

root@d0e87f6eed3d:/# /gpu_perf_model/accel-sim-framework/gpu-app-collection/src/..//bin/11.2/release/polybench-atax
setting device 0 with name NVIDIA RTX A6000
GPU Runtime: 0.002029s
CPU Runtime: 0.026450s
Non-Matching CPU-GPU Outputs Beyond Error Threshold of 0.50 Percent: 0

There is also a similar case for gesummv.

Does anyone also have this problem? (Accel-sim: dev branch, NVbit Tracer:v1.5.5, CUDA:11.2, GPU:A6000)