Closed mahmoodn closed 1 year ago
OK. I found that the statistics are correct in this example. 7f65e1000000 -> read miss 7f65e1000020 -> sector miss 7f65e1000040 -> sector miss 7f65e1000060 -> sector miss 7f6607e00000 -> read miss 7f6607e00020 -> sector miss 7f6607e00040 -> sector miss 7f6607e00060 -> sector miss
7f65e1000000 -> L2 miss 7f65e1000020 -> L2 miss 7f65e1000040 -> L2 miss 7f65e1000060 -> L2 miss 7f6607e00000 -> L2 miss 7f6607e00020 -> L2 miss 7f6607e00040 -> L2 miss 7f6607e00060 -> L2 miss
7f6607e00000 -> write hit 7f6607e00020 -> write hit 7f6607e00040 -> write hit 7f6607e00060 -> write hit
7f6607e00000 -> L2 write hit 7f6607e00020 -> L2 write hit 7f6607e00040 -> L2 write hit 7f6607e00060 -> L2 write hit
So, Although L1D misses are 8, L2 accesses are 8 (L1D misses) + 4 (write hits) which is 12. I will close the issue.
Following this topic, I have created a simple case and attached the trace file (1 warp, 16 instructions), config files (1 SM) and the output file (showing L1D and L2
printf
s). In this example, L1D stats are wrong. So, let's dig into this, first.gpu-cache.cc
and
l2cache.cc
As you can see in the output file,
As you can see, L1D misses are 8, but L2 accesses are 12. Also counting the number of instances of
access()
and L2 accesses shows that 12 accesses to L2 are correct, but L1D accesses are indeed 24.The cycle numbers from L1D miss to L2 access are weird, too. According to the
printf
s, on L2 miss, first a [gpu-cache.cc] is printed and then a [l2cache.cc]. They should be in the same cycle.test.zip