main memory bw measurement -- don't use UNC_L3_LINES_IN_ANY

GoogleCodeExporter commented 9 years ago

Just found likwid (i.e. minutes ago) and I am very happy to see it.  I need to 
measure main memory traffic on a dual socket Nehalem system (i.e. across both 
sockets).  After investigating all the performance counters, it appears that 
some combination of the following is what I want:
UNC_QHL_REQUESTS.LOCAL_(READS|WRITES)
UNC_IMC_NORMAL_READS.ANY
UNC_IMS_WRITES_FULL.ANY

On to my question then -- in reading your wiki pages for Nehalem, I see that 
the performance group "MEM" uses UNC_L3_LINES_IN_ANY as part of its bandwidth 
measurement.  I believe that this counter will count the allocation of a cache 
line in any state, i.e., including modified or exclusive.  This means that if I 
assign a value without first reading it, you would incorrectly count this as 
part of the memory traffic.

This is relevant, in (e.g.), my particular case which is sparse matrix vector 
multiply (SpMV).  The inner loop of SpMV accumulates the dot product of one row 
in the matrix with the vector operand.  The accumulation is held in a processor 
register (ideally).  Once finished, the value is simply written into the 
destination vector, i.e., so as to avoid a write-miss.  Thus, L3 cache should 
allocate a line in the modified state without performing any DRAM access, thus 
defeating the use of UNC_L3_LINES_IN_ANY as a useful counter for memory traffic.

Finally, I must admit that I am at the moment simply searching for a good 
solution.  If the above is wrong, then please simply let me know :)

As a related question:  can likwid access all of the uncore performance 
counters (if yes, then you can add this to your brag sheet as perf cannot do 
this AFAIK).

Thank you,
Pete Stevenson

Original issue reported on code.google.com by etep.nos...@gmail.com on 5 Jun 2011 at 11:28

GoogleCodeExporter commented 9 years ago

so by experimenting with likwid-perfctr, it appears that you are not using 
UNC_L3_LINES_IN_ANY for the "MEM" group on Nehalem.  So my comment would be to 
fix the wiki page to reflect the actual status.

Curious to know -- what performance counters are used and why.
Thank you :)

Original comment by j...@hicampsystems.com on 6 Jun 2011 at 2:56

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Hi,

you can get a documentation with the exact events used for the derived metrics 
with:

likwid-perfctr  -g MEM -H

So just specify -g <GROUP> together with -H and you get the exact group setup. 
So in the case of the MEM group this gives you:

Memory bandwidth [MBytes/s] = 
1.0E-06*(UNC_QMC_NORMAL_READS_ANY+UNC_QMC_WRITES_FULL_ANY)*64/time

additionally there is a metric for the remote traffic:

Remote BW [MBytes/s] 
1.0E-06*(UNC_QHL_REQUESTS_REMOTE_READS+UNC_QHL_REQUESTS_REMOTE_WRITES)*64/time

Thank for your WIKI error report, I will fix that.

You can get a list with all supported events with:

likwid-perfctr -e

If you encounter an error in a performance group or want to define your own 
metrics you can easily do that, the groups are simple text files.

On Nehalem and Westmere likwid-perfctr supports ALL Uncore events. Support for 
NehalemEX and SandyBridge Uncore is underway. Be careful for Uncore events as 
they are always valid per socket and not per core if you are measuring 
sequential applications.

Greetings,

Jan

Original comment by jan.trei...@gmail.com on 14 Jun 2011 at 11:33

Changed state: Invalid
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by jan.trei...@gmail.com on 14 Jun 2011 at 11:34

Changed state: Accepted
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by jan.trei...@gmail.com on 29 Jul 2011 at 10:08

Changed state: Fixed
Added labels: ****
Removed labels: ****

Gwinel / likwid

main memory bw measurement -- don't use UNC_L3_LINES_IN_ANY #49