accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
307 stars 117 forks source link

Deadlock running sass trace #341

Open Evane5cence opened 1 month ago

Evane5cence commented 1 month ago

Hi,

I slightly modified the kernel to implement a simple function and run GEMM, but I encountered a deadlock. This only occurs when the matrix size is large (4096x4096). May I ask if there is any indication of why this might happen? Is there a common reason for this?

Thank you!!!

GPGPU-Sim uArch: ERROR ** deadlock detected: last writeback core 40 @ gpu_sim_cycle 28773 (+ gpu_tot_sim_cycle 4294867296) (71227 cycles ago)
GPGPU-Sim uArch: DEADLOCK  shader cores no longer committing instructions [core(# threads)]:
GPGPU-Sim uArch: DEADLOCK  0(128) 1(128) 2(128) 3(128) 4(128) 5(128) 6(128) 7(128)  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ...  + others ... 
GPGPU-Sim uArch DEADLOCK:  memory partition 0 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 1 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 2 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 3 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 4 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 5 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 6 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 7 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 8 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 9 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 10 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 11 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 12 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 13 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 14 busy
GPGPU-Sim uArch DEADLOCK:  memory partition 15 busy
GPGPU-Sim uArch DEADLOCK:  iterconnect contains traffic
GPGPU-Sim uArch: ICNT:Display State: Under implementation

Re-run the simulator in gdb and use debug routines in .gdbinit to debug this
JRPan commented 1 month ago

I need more info. What did you change? Did you change the trace directly?

Evane5cence commented 1 month ago

Thanks for the reply! I did not change the trace. I changed the address of mf using a certain formula.

JRPan commented 1 month ago

at what stage? At mf allocation?

which function did you change.

Evane5cence commented 1 month ago

At the stage when the interconnect passes the mf to the L2 cache, I modify the address of the mf. When the L2 cache passes the mf back to the interconnect, I restore the original address.

JRPan commented 1 month ago

This sounds like fine. Without seeing your code I cannot really help much.

But my guess is mf is being directed to somewhere else. Or the mf gets merged at L2, but failed to notify L1 when writeback.

My recommendation would be just to modify the address at mf allocation and keep it like that. No need to restore it back. If you just want to redirect the mf to different L2 banks, you can just change the memory subpartition hash function, without changing the address.

Evane5cence commented 1 month ago

Thanks! May I ask what is the base_addr of L1/DRAM? How can I retreive them?

Evane5cence commented 1 month ago

i only see the shmem base_addr and local mem base_addr

JRPan commented 1 month ago

base_addr is for local and shmem only. the actual address is base_addr + offset. You can consider this as an allocated array.

L1/DRAM is global memory, which uses the global address. the address for each instruction can be found in mem_access_t object.