accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
289 stars 110 forks source link

How to use the visualizer AerialVision in Accel-Sim? #245

Closed ConvolutedDog closed 1 year ago

JRPan commented 1 year ago

Run simulations with this option enabled https://github.com/accel-sim/accel-sim-framework/blob/dev/util/job_launching/configs/define-standard-cfgs.yml#L138

then you can open the .gz in aerialvision.

ConvolutedDog commented 1 year ago

@JRPan Thanks for your proposal, and I want to use the code in ptx-stats.cc to collect the statistical information of each single SASS instruction.

However, the accel-sim dose not collect information about SASS and write to the gpgpu_inst_stats.txt.

Can you give us more suggestions for modifications?

ConvolutedDog commented 1 year ago

@JRPan For example, the code in ptx_stats.cc:

void ptx_stats::ptx_file_line_stats_add_dram_traffic(unsigned pc,
                                                     unsigned dram_traffic) {
  const ptx_instruction *pInsn = gpgpu_ctx->pc_to_instruction(pc);
  if (pInsn != NULL)
    ptx_file_line_stats_tracker[ptx_file_line(pInsn->source_file(),
                                              pInsn->source_line())]
        .dram_traffic += dram_traffic;
}

the PTX instruction object pInsn is actually NULL, so I want to reuse this function, and get the SASS instruction by pc. But I am not familiar with the code for the front-end of Accel-Sim, so I don't kown how to get the sass instruction by pc here.

If the SASS instruction is got, I can update the statistical information ptx_file_line_stats_tracker.

ConvolutedDog commented 1 year ago

@JRPan I hav completed the total aerialvision modifications on SASS Instructions, and would like to ask you another question. When nvbit retrieves the sass instructions executed on GPU by the cutlass program, it will get kernel-1.traceg, kernel-2.traceg, and kernel-3.traceg, and all three sass instruction files .traceg are executing the same kernel (with the same kernel name and instructions, but distinguished by the kernel id in the three .traceg files). In aerivision, it can be seen that during the entire execution period, each shader can execute these three kernels in parallel. So I would like to ask if these three sass instruction files are actually executed in parallel on the real GPU.

kernel-1.traceg:

-kernel name = _ZN7cutlass4gemm16gemm_kernel_...ParamsE
-kernel id = 1
-grid dim = (1,1,1)
-block dim = (256,1,1)
-shmem = 16656
-nregs = 128
-binary version = 75
-cuda stream id = 0
-shmem base_addr = 0x00007fc8e0000000
-local mem base_addr = 0x00007fc8de000000
-nvbit version = 1.5.5
-accelsim tracer version = 4
-enable lineinfo = 1

#traces format = [line_num] PC mask dest_num [reg_dests] opcode src_num [reg_srcs] mem_width [adrrescompress?] [mem_addresses]

#BEGIN_TB

thread block = 0,0,0

warp = 0
insts = 5637
359 0000 ffffffff 1 R1 MOV 0 0 
99 0010 ffffffff 0 S2UR 0 0 
99 0020 ffffffff 0 ULDC 0 0 
......

kernel-2.traceg:

-kernel name = _ZN7cutlass4gemm16gemm_kernel_...ParamsE
-kernel id = 2
-grid dim = (1,1,1)
-block dim = (256,1,1)
-shmem = 16656
-nregs = 128
-binary version = 75
-cuda stream id = 0
-shmem base_addr = 0x00007fc8e0000000
-local mem base_addr = 0x00007fc8de000000
-nvbit version = 1.5.5
-accelsim tracer version = 4
-enable lineinfo = 1

#traces format = [line_num] PC mask dest_num [reg_dests] opcode src_num [reg_srcs] mem_width [adrrescompress?] [mem_addresses]

#BEGIN_TB

thread block = 0,0,0

warp = 0
insts = 5637
359 0000 ffffffff 1 R1 MOV 0 0 
99 0010 ffffffff 0 S2UR 0 0 
99 0020 ffffffff 0 ULDC 0 0 
......

kernel-3.traceg:

-kernel name = _ZN7cutlass4gemm16gemm_kernel_...ParamsE
-kernel id = 3
-grid dim = (1,1,1)
-block dim = (256,1,1)
-shmem = 16656
-nregs = 128
-binary version = 75
-cuda stream id = 0
-shmem base_addr = 0x00007fc8e0000000
-local mem base_addr = 0x00007fc8de000000
-nvbit version = 1.5.5
-accelsim tracer version = 4
-enable lineinfo = 1

#traces format = [line_num] PC mask dest_num [reg_dests] opcode src_num [reg_srcs] mem_width [adrrescompress?] [mem_addresses]

#BEGIN_TB

thread block = 0,0,0

warp = 0
insts = 5637
359 0000 ffffffff 1 R1 MOV 0 0 
99 0010 ffffffff 0 S2UR 0 0 
99 0020 ffffffff 0 ULDC 0 0 
......

Aerivision:

shaderInsn_vs_globalCycle_dydx_sgemm_64

As can be seen, the three SASS files are executed on the No.0, No.1, and No.2 shader cores, respectively. However, in real GPU hardware, the three shader cores can be executed in parallel. Therefore, is it due to the issue of nvbit generating three files that Accel-Sim did not execute correctly in parallel?

JRPan commented 1 year ago

Sorry for the late reply. I didn't see your previous question.

Each traceg file is one kernel. If there are multiple traceg files with the same name, then this means the kernel is launched multiple times with different inputs. So they are different kernel launches and I don't know if these kernels actually run in parallel on the actual GPU. But looking on the grid/block dim it should be able to run in parallel.

But if you want to run kernel concurrently: by default, kernels are executed in serial. To enable concurrent execution, you need to add this to the config https://github.com/accel-sim/accel-sim-framework/blob/dev/util/job_launching/configs/define-standard-cfgs.yml#L67

so run QV100-SASS-MULTI_KERNEL if you are using run_simualtions.py. If you run simulations by yourself, you can add the line to the config file.

But what you see is expected. It's the same kernel, launched three times and by default, there is no concurrency between kernel launches.

ConvolutedDog commented 1 year ago

@JRPan Thanks a lot, MULTI_KERNEL is well and I will close this issue.