NVlabs / NVBit

198 stars 18 forks source link

NVbit can not work with Lammps? #116

Open luckyq opened 1 year ago

luckyq commented 1 year ago

Hi, All.

I am trying to use NVbit to get some kernel traces.

But when run lamps with Accel-sim's tracer_tool, it has the following problem. (I have tested other programs. It works fine.)

This is the terminal cmd:

CUDA_INJECTION64_PATH=/home/pwd/accel-sim-framework/util/tracer_nvbit/tracer_tool/tracer_tool.so TOOL_VERBOSE=1 DYNAMIC_KERNEL_LIMIT_START=40 /home/pwd/lammps-23Jun2022/build/lmp -sf gpu -pk gpu 1 newton yes neigh yes -in /home/pwd/lammps-23Jun2022/bench/in.example1

The problem is it's stuck in some instr and never move forward. The following txt is the log.

------------- NVBit (NVidia Binary Instrumentation Tool v1.5.3) Loaded -------------- NVBit core environment variables (mostly for nvbit-devs): NVDISASM = nvdisasm - override default nvdisasm found in PATH NOBANNER = 0 - if set, does not print this banner

     INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
       INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count

DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit DYNAMIC_KERNEL_LIMIT_START = 40 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect TOOL_VERBOSE = 1 - Enable verbosity inside the tool TOOL_COMPRESS = 1 - Enable traces compression TOOL_TRACE_CORE = 0 - write the core id in the traces TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment

LAMMPS (23 Jun 2022 - Update 3) using 1 OpenMP thread(s) per MPI task ------------- NVBit (NVidia Binary Instrumentation Tool v1.5.3) Loaded -------------- NVBit core environment variables (mostly for nvbit-devs): NVDISASM = nvdisasm - override default nvdisasm found in PATH NOBANNER = 0 - if set, does not print this banner

     INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
       INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
EXCLUDE_PRED_OFF = 1 - Exclude predicated off instruction from count

DYNAMIC_KERNEL_LIMIT_END = 0 - Limit of the number kernel to be printed, 0 means no limit DYNAMIC_KERNEL_LIMIT_START = 40 - start to report kernel from this kernel id, 0 means starts from the beginning, i.e. first kernel ACTIVE_FROM_START = 1 - Start instruction tracing from start or wait for cuProfilerStart and cuProfilerStop. If set to 0, DYNAMIC_KERNEL_LIMIT options have no effect TOOL_VERBOSE = 1 - Enable verbosity inside the tool TOOL_COMPRESS = 1 - Enable traces compression TOOL_TRACE_CORE = 0 - write the core id in the traces TERMINATE_UPON_LIMIT = 0 - Stop the process once the current kernel > DYNAMIC_KERNEL_LIMIT_END USER_DEFINED_FOLDERS = 0 - Uses the user defined folder TRACES_FOLDER path environment

Inspecting function kernel_info at address 0x7f7084fa7e00 Instr 0 @ 0x0 (0) - IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; has_guard_pred = 0 opcode = IMAD.MOV.U32/IMAD memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 1 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = CBANK is_neg/is_not/abs = 0/0/0 id = 0 has_imm_offset = 1 imm_offset = 40 has_reg_offset = 0 reg_offset = 0 Instr 1 @ 0x10 (16) - @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; has_guard_pred = 1 guard_pred_num = 7 guard_pred_negated = 1 guard_pred_uniform = 0 opcode = SHFL.IDX/SHFL memop = NONE load/store = 0/1 is_extended = 0 size = 0 --op[0].type = PRED is_neg/is_not/abs = 0/0/0 num = 7 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[4].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = Instr 2 @ 0x20 (32) - IMAD.MOV.U32 R2, RZ, RZ, c[0x0][0x160] ; has_guard_pred = 0 opcode = IMAD.MOV.U32/IMAD memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 2 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = CBANK is_neg/is_not/abs = 0/0/0 id = 0 has_imm_offset = 1 imm_offset = 352 has_reg_offset = 0 reg_offset = 0 Instr 3 @ 0x30 (48) - MOV R5, 0x67 ; has_guard_pred = 0 opcode = MOV/MOV memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 5 prop = --op[1].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x67 Instr 4 @ 0x40 (64) - IMAD.MOV.U32 R7, RZ, RZ, 0x2bc ; has_guard_pred = 0 opcode = IMAD.MOV.U32/IMAD memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 7 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x2bc Instr 5 @ 0x50 (80) - MOV R11, 0x1 ; has_guard_pred = 0 opcode = MOV/MOV memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 11 prop = --op[1].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x1 Instr 6 @ 0x60 (96) - IMAD.MOV.U32 R9, RZ, RZ, 0x20 ; has_guard_pred = 0 opcode = IMAD.MOV.U32/IMAD memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 9 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x20 Instr 7 @ 0x70 (112) - IADD3 R2, P0, R2, 0x4, RZ ; has_guard_pred = 0 opcode = IADD3/IADD3 memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 2 prop = --op[1].type = PRED is_neg/is_not/abs = 0/0/0 num = 0 --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 2 prop = --op[3].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x4 --op[4].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = Instr 8 @ 0x80 (128) - IMAD.MOV.U32 R13, RZ, RZ, 0x4 ; has_guard_pred = 0 opcode = IMAD.MOV.U32/IMAD memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 13 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x4 Instr 9 @ 0x90 (144) - MOV R15, 0x100 ; has_guard_pred = 0 opcode = MOV/MOV memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 15 prop = --op[1].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x100 Instr 10 @ 0xa0 (160) - IMAD.MOV.U32 R17, RZ, RZ, 0x2 ; has_guard_pred = 0 opcode = IMAD.MOV.U32/IMAD memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 17 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x2 Instr 11 @ 0xb0 (176) - IADD3.X R3, RZ, c[0x0][0x164], RZ, P0, !PT ; has_guard_pred = 0 opcode = IADD3.X/IADD3 memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 3 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = CBANK is_neg/is_not/abs = 0/0/0 id = 0 has_imm_offset = 1 imm_offset = 356 has_reg_offset = 0 reg_offset = 0 --op[3].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[4].type = PRED is_neg/is_not/abs = 0/0/0 num = 0 --op[5].type = PRED is_neg/is_not/abs = 0/1/0 num = 7 Instr 12 @ 0xc0 (192) - MOV R19, 0x80 ; has_guard_pred = 0 opcode = MOV/MOV memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 19 prop = --op[1].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x80 Instr 13 @ 0xd0 (208) - MOV R21, 0xb ; has_guard_pred = 0 opcode = MOV/MOV memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 21 prop = --op[1].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0xb Instr 14 @ 0xe0 (224) - STG.E.SYS [R2], R5 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 0 imm = 0 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 5 prop = Instr 15 @ 0xf0 (240) - STG.E.SYS [R2+-0x4], R7 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = -4 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 7 prop = Instr 16 @ 0x100 (256) - STG.E.SYS [R2+0x8], R9 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 8 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 9 prop = Instr 17 @ 0x110 (272) - MOV R5, 0x8 ; has_guard_pred = 0 opcode = MOV/MOV memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 5 prop = --op[1].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x8 Instr 18 @ 0x120 (288) - STG.E.SYS [R2+0x4], R9 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 4 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 9 prop = Instr 19 @ 0x130 (304) - IMAD.MOV.U32 R7, RZ, RZ, 0x40 ; has_guard_pred = 0 opcode = IMAD.MOV.U32/IMAD memop = NONE load/store = 0/0 --op[0].type = REG is_neg/is_not/abs = 0/0/0 num = 7 prop = --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[2].type = REG is_neg/is_not/abs = 0/0/0 num = 255 prop = --op[3].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x40 Instr 20 @ 0x140 (320) - STG.E.SYS [R2+0x10], R11 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 16 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 11 prop = Instr 21 @ 0x150 (336) - STG.E.SYS [R2+0xc], R11 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 12 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 11 prop = Instr 22 @ 0x160 (352) - STG.E.SYS [R2+0x14], R13 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 20 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 13 prop = Instr 23 @ 0x170 (368) - STG.E.SYS [R2+0x20], R15 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 32 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 15 prop = Instr 24 @ 0x180 (384) - STG.E.SYS [R2+0x24], R15 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 36 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 15 prop = Instr 25 @ 0x190 (400) - STG.E.SYS [R2+0x1c], R17 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 28 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 17 prop = Instr 26 @ 0x1a0 (416) - STG.E.SYS [R2+0x28], R19 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 40 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 19 prop = Instr 27 @ 0x1b0 (432) - STG.E.SYS [R2+0x30], R19 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 48 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 19 prop = Instr 28 @ 0x1c0 (448) - STG.E.SYS [R2+0x38], R19 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 56 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 19 prop = Instr 29 @ 0x1d0 (464) - STG.E.SYS [R2+0x40], R19 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 64 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 19 prop = Instr 30 @ 0x1e0 (480) - STG.E.SYS [R2+0x2c], R7 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 44 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 7 prop = Instr 31 @ 0x1f0 (496) - STG.E.SYS [R2+0x3c], R21 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 60 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 21 prop = Instr 32 @ 0x200 (512) - STG.E.SYS [R2+0x18], R5 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 24 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 5 prop = Instr 33 @ 0x210 (528) - STG.E.SYS [R2+0x34], R5 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 52 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 5 prop = Instr 34 @ 0x220 (544) - STG.E.SYS [R2+0x44], R5 ; has_guard_pred = 0 opcode = STG.E.SYS/STG memop = GLOBAL load/store = 0/1 is_extended = 1 size = 4 --op[0].type = MREF is_neg/is_not/abs = 0/0/0 has_ra = 1 ra_num = 2 ra_mod = 64 has_ur = 0 ur_num = 0 ur_mod = NO_MOD has_imm = 1 imm = 68 --op[1].type = REG is_neg/is_not/abs = 0/0/0 num = 5 prop = Instr 35 @ 0x230 (560) - EXIT ; has_guard_pred = 0 opcode = EXIT/EXIT memop = NONE load/store = 0/0 Instr 36 @ 0x240 (576) - BRA 0x240; has_guard_pred = 0 opcode = BRA/BRA memop = NONE load/store = 0/0 --op[0].type = IMM_UINT64 is_neg/is_not/abs = 0/0/0 value = 0x240 Instr 37 @ 0x250 (592) - NOP; has_guard_pred = 0 opcode = NOP/NOP memop = NONE load/store = 0/0 Instr 38 @ 0x260 (608) - NOP; has_guard_pred = 0 opcode = NOP/NOP memop = NONE load/store = 0/0 Instr 39 @ 0x270 (624) - NOP; has_guard_pred = 0 opcode = NOP/NOP memop = NONE load/store = 0/0

mahmoodn commented 1 year ago

I also want to know if Nvbit really works with MPI or not.