Closed shreyas42singh closed 5 months ago
Looking at the trace of how m_stores_outstanding
is updated for nn-rodinia-2.0-ft (outstanding is the value of m_stores_outstanding when store_ack
and inc_store_req
are called in shader.cc
).
Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 1
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 2 **************
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 1 **************
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 1
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 0 **************
accel-sim.out: shader.h:246: void shd_warp_t::dec_store_req(): Assertion `m_stores_outstanding > 0' failed.
The error is generated because you are trying to reduce the count from a warp that did not issue the store.
For the original case where the L2 would have been write back we can see the following:
Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 4 with outstanding 1
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 2 **************
Reduce stores in ldst_unit::cycle for warp: 4 ************ outstanding = 1 **************
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 3 with outstanding 1
Increase stores in process_memory_access_queue_l1cache by 1 in warp 10 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 10 with outstanding 1
Increase stores in process_memory_access_queue_l1cache by 1 in warp 6 with outstanding 0
Increase stores in process_memory_access_queue_l1cache by 1 in warp 6 with outstanding 1
In write ack of ldst_unit::cycle for warp: 3 ************ outstanding = 2 **************
... (and this continues)
However, when using a write through L2 cache produces an error of
accel-sim.out: dram.cc:247: void dram_t::push(mem_fetch*): Assertion `id == data->get_tlx_addr().chip` failed.
for backprop, streamcluster, mem_bw and MaxFlops while lud and hotspot seem to run to completion.
Does this mean there is some problem with ack generation and the reception of it?
Probably a bug. I haven't 100% thought about this, but ACK is sent by L2 when L2 finishes writing the data. In WT, probably two replies are generated somewhere, so the ACK is duplicated.
So in WB: L1 send write to L2 -> L2 have data, send reply to L1. Total 1 reply. in WT: L1 send write to L2->L2 send reply to L1, then send write to DRAM -> Dram have data, Dram send reply. Total 2 reply.
try making L2 to send reply only if L2 is write back. https://github.com/accel-sim/gpgpu-sim_distribution/blob/dev/src/gpgpu-sim/l2cache.cc#L563-L566 Wrap these 3 lines inside a IF that checks if L2 is WB.
Let me know if it works.
Most of the accesses at the start are MISS so they don't enter those code blocks. Do you have any other suggestions of where to look?
This is where L2 handles misses.
status != RESERVATION_FAIL
which include MISS
Yup that was my bad:
Adding the guard
if (m_config->m_L2_config.get_write_policy() == WRITE_BACK)
around
https://github.com/accel-sim/gpgpu-sim_distribution/blob/dev/src/gpgpu-sim/l2cache.cc#L563-L566
seems to have worked.
Thanks a lot!
Glad it worked. Please file a PR if you are interested.
When I try making the L2 Cache writethrough I get the error
accel-sim.out: shader.h:245: void shd_warp_t::dec_store_req(): Assertion m_stores_outstanding > 0 failed.
when I run the nn-rodinia-2.0-ft trace.Essentially, this configuration runs to completion
however, this fails
The command I am using to run the simulator is
./gpu-simulator/bin/release/accel-sim.out -trace ../accel-sim-framework/hw_run/rodinia_2.0-ft/11.0/nn-rodinia-2.0-ft/__data_filelist_4_3_30_90___data_filelist_4_3_30_90_result_txt/traces/kernelslist.g -config ./gpu-simulator/gpgpu-sim/configs/tested-cfgs/SM7_QV100/gpgpusim.config -config ./gpu-simulator/configs/tests/trace.config