accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
294 stars 114 forks source link

Error when making L2 Cache writethrough #304

Closed shreyas42singh closed 3 months ago

shreyas42singh commented 4 months ago

When I try making the L2 Cache writethrough I get the error accel-sim.out: shader.h:245: void shd_warp_t::dec_store_req(): Assertion m_stores_outstanding > 0 failed. when I run the nn-rodinia-2.0-ft trace.

Essentially, this configuration runs to completion

-gpgpu_cache:il1     N:64:128:16,L:R:f:N:L,S:2:48,4 # shader L1 instruction cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:dl1     S:4:128:64,L:T:m:L:L,A:512:8,16:0,32 # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}

-gpgpu_cache:dl2     S:32:128:24,L:B:m:L:P,A:192:4,32:0,32 # unified banked L2 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>}

however, this fails

-gpgpu_cache:il1     N:64:128:16,L:R:f:N:L,S:2:48,4 # shader L1 instruction cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>} 
-gpgpu_cache:dl1     S:4:128:64,L:T:m:L:L,A:512:8,16:0,32 # per-shader L1 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq> | none}

-gpgpu_cache:dl2     S:32:128:24,L:T:m:L:P,A:192:4,32:0,32 # unified banked L2 data cache config  {<nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>,<mshr>:<N>:<merge>,<mq>}

The command I am using to run the simulator is ./gpu-simulator/bin/release/accel-sim.out -trace ../accel-sim-framework/hw_run/rodinia_2.0-ft/11.0/nn-rodinia-2.0-ft/__data_filelist_4_3_30_90___data_filelist_4_3_30_90_result_txt/traces/kernelslist.g -config ./gpu-simulator/gpgpu-sim/configs/tested-cfgs/SM7_QV100/gpgpusim.config -config ./gpu-simulator/configs/tests/trace.config

shreyas42singh commented 4 months ago

Looking at the trace of how m_stores_outstanding is updated for nn-rodinia-2.0-ft (outstanding is the value of m_stores_outstanding when store_ack and inc_store_req are called in shader.cc).

Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 1
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 2 **************
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 1 **************
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 1
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 0 **************

accel-sim.out: shader.h:246: void shd_warp_t::dec_store_req(): Assertion `m_stores_outstanding > 0' failed.

The error is generated because you are trying to reduce the count from a warp that did not issue the store.

For the original case where the L2 would have been write back we can see the following:

Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 1
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 2 **************
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 1 **************
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 1
Increase stores in process_memory_access_queue_l1cache by 1 in warp 10 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 10 with outstanding 1
Increase stores in process_memory_access_queue_l1cache by 1 in warp 6 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 6 with outstanding 1
In write ack of ldst_unit::cycle for warp: 3  ************ outstanding = 2 ************** 

... (and this continues) 

However, when using a write through L2 cache produces an error of

accel-sim.out: dram.cc:247: void dram_t::push(mem_fetch*): Assertion `id == data->get_tlx_addr().chip` failed.

for backprop, streamcluster, mem_bw and MaxFlops while lud and hotspot seem to run to completion.

Does this mean there is some problem with ack generation and the reception of it?

JRPan commented 4 months ago

Probably a bug. I haven't 100% thought about this, but ACK is sent by L2 when L2 finishes writing the data. In WT, probably two replies are generated somewhere, so the ACK is duplicated.

So in WB: L1 send write to L2 -> L2 have data, send reply to L1. Total 1 reply. in WT: L1 send write to L2->L2 send reply to L1, then send write to DRAM -> Dram have data, Dram send reply. Total 2 reply.

try making L2 to send reply only if L2 is write back. https://github.com/accel-sim/gpgpu-sim_distribution/blob/dev/src/gpgpu-sim/l2cache.cc#L563-L566 Wrap these 3 lines inside a IF that checks if L2 is WB.

Let me know if it works.

shreyas42singh commented 4 months ago

Most of the accesses at the start are MISS so they don't enter those code blocks. Do you have any other suggestions of where to look?

JRPan commented 4 months ago

This is where L2 handles misses. status != RESERVATION_FAIL which include MISS

shreyas42singh commented 4 months ago

Yup that was my bad:

Adding the guard if (m_config->m_L2_config.get_write_policy() == WRITE_BACK) around https://github.com/accel-sim/gpgpu-sim_distribution/blob/dev/src/gpgpu-sim/l2cache.cc#L563-L566

seems to have worked.

Thanks a lot!

JRPan commented 4 months ago

Glad it worked. Please file a PR if you are interested.