dotnet / BenchmarkDotNet

Powerful .NET library for benchmarking
https://benchmarkdotnet.org
MIT License
10.54k stars 969 forks source link

Unable to read some hardware counters on ZEN2 CPU #1520

Open alexcovington opened 4 years ago

alexcovington commented 4 years ago

Having trouble reading 'CacheMisses' and 'InstructionsRetired' whenever I run the dotnet/performance benchmark suite. If I run the command from within the src\benchmarks\micro directory:

dotnet run -c Release -f netcoreapp5.0 --filter '*Bilinear*' --counters CacheMisses InstructionsRetired

I get the following error message:

// Validating benchmarks:
The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
The counter InstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V

I am able to get some counters to read by enabling IBS in my BIOS (technically wasn't called IBS, I had to disable SVM to get it to work). So the following command does work for me:

dotnet run -c Release -f netcoreapp5.0 --filter '*Bilinear*' --counters BranchMispredictions

If I profile a C# application outside of BenchmarkDotNet using AMD uProf, I can get cache miss and instructions retired statistics. I am also able to read all of the counters on my Skylake machine using the BenchmarkDotNet CLI without any issues.

Would appreciate any help that can be provided!

adamsitnik commented 4 years ago

Hi @alexcovington

BenchmarkDotNet is using TraceEvent which internally uses ETW to get hardware counters information. I am afraid that there is an AMD-specific bug somewhere.

Could you please run the following command and share your output here?

tracelog.exe -profilesources Help

tracelog.exe might not be present in your $PATH so you can use Visual Studio Command Prompt to run this command:

obraz

alexcovington commented 4 years ago

Hi @adamsitnik, sorry for the wait on this. With COVID, I don't have physical access to this machine all the time, so I wasn't able to work on this till now.

Here's the output from tracelog running in admin Developer Command Prompt for VS 2019:

C:\Windows\System32>tracelog -profilesources Help
Id  Name                        Interval  Min      Max
--------------------------------------------------------------
  0 Timer                          10000  1221    1000000
  2 TotalIssues                    65536  4096 2147483647
  6 BranchInstructions             65536  4096 2147483647
  8 DcacheMisses                   65536  4096 2147483647
  9 IcacheMisses                   65536  4096 2147483647
 11 BranchMispredictions           65536  4096 2147483647
 13 FpInstructions                 65536  4096 2147483647
 20 IcacheIssues                   65536  4096 2147483647
 21 DcacheAccesses                 65536  4096 2147483647
 25 FPDispatchedFPUOps             65536  4096 2147483647
 26 FPDispatchedFPUOpsAddExcludeJunk      65536  4096 2147483647
 27 FPDispatchedFPUOpsMulExcludeJunk      65536  4096 2147483647
 28 FPDispatchedFPUOpsStoreExcludeJunk      65536  4096 2147483647
 29 FPDispatchedFPUOpsAddJunk      65536  4096 2147483647
 30 FPDispatchedFPUOpsMulJunk      65536  4096 2147483647
 31 FPDispatchedFPUOpsStoreJunk      65536  4096 2147483647
 32 FPCyclesNoFPUOpsRetired        65536  4096 2147483647
 33 FPDispathedFPUOpsWithFastFlag      65536  4096 2147483647
 34 LSSegmentRegisterLoad          65536  4096 2147483647
 35 LSSegmentRegisterLoadES        65536  4096 2147483647
 36 LSSegmentRegisterLoadCS        65536  4096 2147483647
 37 LSSegmentRegisterLoadSS        65536  4096 2147483647
 38 LSSegmentRegisterLoadDS        65536  4096 2147483647
 39 LSSegmentRegisterLoadFS        65536  4096 2147483647
 40 LSSegmentRegisterLoadGS        65536  4096 2147483647
 41 LSSegmentRegisterLoadHS        65536  4096 2147483647
 42 LSResyncBySelfModifyingCode      65536  4096 2147483647
 43 LSResyncBySnoop                65536  4096 2147483647
 44 LSBuffer2Full                  65536  4096 2147483647
 45 LSLockedOperation              65536  4096 2147483647
 46 LSLateCancelOperation          65536  4096 2147483647
 47 LSRetiredCFLUSH                65536  4096 2147483647
 48 LSRetiredCPUID                 65536  4096 2147483647
 49 DCAccess                       65536  4096 2147483647
 50 DCMiss                         65536  4096 2147483647
 51 DCRefillFromL2                 65536  4096 2147483647
 52 DCRefillFromL2Invalid          65536  4096 2147483647
 53 DCRefillFromL2Shared           65536  4096 2147483647
 54 DCRefillFromL2Exclusive        65536  4096 2147483647
 55 DCRefillFromL2Owner            65536  4096 2147483647
 56 DCRefillFromL2Modified         65536  4096 2147483647
 57 DCRefillFromSystem             65536  4096 2147483647
 58 DCRefillFromSystemInvalid      65536  4096 2147483647
 59 DCRefillFromSystemShared       65536  4096 2147483647
 60 DCRefillFromSystemExclusive      65536  4096 2147483647
 61 DCRefillFromSystemOwner        65536  4096 2147483647
 62 DCRefillFromSystemModified      65536  4096 2147483647
 63 DCRefillCopyBack               65536  4096 2147483647
 64 DCRefillCopyBackInvalid        65536  4096 2147483647
 65 DCRefillCopyBackShared         65536  4096 2147483647
 66 DCRefillCopyBackExclusive      65536  4096 2147483647
 67 DCRefillCopyBackOwner          65536  4096 2147483647
 68 DCRefillCopyBackModified       65536  4096 2147483647
 69 DCL1DTLBMissL2DTLBHit          65536  4096 2147483647
 70 DCL1DTLBMissL2DTLBMiss         65536  4096 2147483647
 71 DCMisalignedDataReference      65536  4096 2147483647
 72 DCLateCancelOfAnAccess         65536  4096 2147483647
 73 DCEarlyCancelOfAnAccess        65536  4096 2147483647
 74 DCOneBitECCError               65536  4096 2147483647
 75 DCOneBitECCErrorScrubberError      65536  4096 2147483647
 76 DCOneBitECCErrorPiggybackScrubberError      65536  4096 2147483647
 77 DCDispatchedPrefetchInstructions      65536  4096 2147483647
 78 DCDispatchedPrefetchInstructionsLoad      65536  4096 2147483647
 79 DCDispatchedPrefetchInstructionsStore      65536  4096 2147483647
 80 DCDispatchedPrefetchInstructionsNTA      65536  4096 2147483647
190 BUCleanToDirty                 65536  4096 2147483647
191 BUSharedToDirty                65536  4096 2147483647
 81 BUInternalL2Request            65536  4096 2147483647
 82 BUInternalL2RequestICFill      65536  4096 2147483647
 83 BUInternalL2RequestDCFill      65536  4096 2147483647
 84 BUInternalL2RequestTLBReload      65536  4096 2147483647
 85 BUInternalL2RequestTagSnoopRequest      65536  4096 2147483647
 86 BUInternalL2RequestCancelledRequest      65536  4096 2147483647
 87 BUFillRequestMissedInL2        65536  4096 2147483647
 88 BUFillRequestMissedInL2ICFill      65536  4096 2147483647
 89 BUFillRequestMissedInL2DCFill      65536  4096 2147483647
 90 BUFillRequestMissedInL2TLBLoad      65536  4096 2147483647
 91 BUFillIntoL2                   65536  4096 2147483647
 92 BUFillIntoL2DirtyL2Victim      65536  4096 2147483647
 93 BUFillIntoL2VictimFromL1       65536  4096 2147483647
 94 ICFetch                        65536  4096 2147483647
 95 ICMiss                         65536  4096 2147483647
 96 ICRefillFromL2                 65536  4096 2147483647
 97 ICRefillFromSystem             65536  4096 2147483647
 98 ICL1TLBMissL2TLBHit            65536  4096 2147483647
 99 ICL1TLBMissL2TLBMiss           65536  4096 2147483647
100 ICResyncBySnoop                65536  4096 2147483647
101 ICInstructionFetchStall        65536  4096 2147483647
102 ICReturnStackHit               65536  4096 2147483647
103 ICReturnStackOverflow          65536  4096 2147483647
104 FRRetiredx86Instructions       65536  4096 2147483647
105 FRRetireduops                  65536  4096 2147483647
106 FRRetiredBranches              65536  4096 2147483647
107 FRRetiredBranchesMispredicted      65536  4096 2147483647
108 FRRetiredTakenBranches         65536  4096 2147483647
109 FRRetiredTakenBranchesMispredicted      65536  4096 2147483647
110 FRRetiredFarControlTransfers      65536  4096 2147483647
111 FRRetiredResyncsNonControlTransferBranches      65536  4096 2147483647
112 FRRetiredNearReturns           65536  4096 2147483647
113 FRRetiredNearReturnsMispredicted      65536  4096 2147483647
114 FRRetiredTakenBranchesMispredictedByAddressMiscompare      65536  4096 2147483647
115 FRRetiredFPUInstructions       65536  4096 2147483647
116 FRRetiredFPUInstructionsx87      65536  4096 2147483647
117 FRRetiredFPUInstructionsMMXAnd3DNow      65536  4096 2147483647
118 FRRetiredFPUInstructionsPackedSSEAndSSE2      65536  4096 2147483647
119 FRRetiredFPUInstructionsScalarSSEAndSSE2      65536  4096 2147483647
120 FRRetiredFastpathDoubleOpInstructions      65536  4096 2147483647
121 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition0      65536  4096 2147483647
122 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition1      65536  4096 2147483647
123 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition2      65536  4096 2147483647
124 FRInterruptsMaskedCycles       65536  4096 2147483647
125 FRInterruptsMaskedWhilePendingCycles      65536  4096 2147483647
126 FRTakenHardwareInterrupts      65536  4096 2147483647
127 FRNothingToDispatch            65536  4096 2147483647
128 FRDispatchStalls               65536  4096 2147483647
129 FRDispatchStallsFromBranchAbortToRetire      65536  4096 2147483647
130 FRDispatchStallsForSerialization      65536  4096 2147483647
131 FRDispatchStallsForSegmentLoad      65536  4096 2147483647
132 FRDispatchStallsWhenReorderBufferFull      65536  4096 2147483647
133 FRDispatchStallsWhenReservationStationsFull      65536  4096 2147483647
134 FRDispatchStallsWhenFPUFull      65536  4096 2147483647
135 FRDispatchStallsWhenLSFull      65536  4096 2147483647
136 FRDispatchStallsWhenWaitingForAllQuiet      65536  4096 2147483647
137 FRDispatchStallsWhenFarControlOrResyncBranchPending      65536  4096 2147483647
138 FRFPUExceptions                65536  4096 2147483647
139 FRFPUExceptionsx87ReclassMicroFaults      65536  4096 2147483647
140 FRFPUExceptionsSSERetypeMicroFaults      65536  4096 2147483647
141 FRFPUExceptionsSSEReclassMicroFaults      65536  4096 2147483647
142 FRFPUExceptionsSSEAndx87MicroTraps      65536  4096 2147483647
143 FRNumberOfBreakPointsForDR0      65536  4096 2147483647
144 FRNumberOfBreakPointsForDR1      65536  4096 2147483647
145 FRNumberOfBreakPointsForDR2      65536  4096 2147483647
146 FRNumberOfBreakPointsForDR3      65536  4096 2147483647
147 NBMemoryControllerPageAccessEvent      65536  4096 2147483647
148 NBMemoryControllerPageAccessEventPageHit      65536  4096 2147483647
149 NBMemoryControllerPageAccessEventPageMiss      65536  4096 2147483647
150 NBMemoryControllerPageAccessEventPageConflict      65536  4096 2147483647
151 NBMemoryControllerPageTableOverflow      65536  4096 2147483647
152 NBMemoryControllerDRAMCommandSlotsMissed      65536  4096 2147483647
153 NBMemoryControllerTurnAround      65536  4096 2147483647
154 NBMemoryControllerTurnAroundDIMM      65536  4096 2147483647
155 NBMemoryControllerTurnAroundReadToWrite      65536  4096 2147483647
156 NBMemoryControllerTurnAroundWriteToRead      65536  4096 2147483647
157 NBMemoryControllerBypassCounter      65536  4096 2147483647
158 NBMemoryControllerBypassCounterHighPriority      65536  4096 2147483647
159 NBMemoryControllerBypassCounterLowPriority      65536  4096 2147483647
160 NBMemoryControllerBypassCounterDRAMControllerInterface      65536  4096 2147483647
161 NBMemoryControllerBypassCounterDRAMControllerQueue      65536  4096 2147483647
162 NBSizedCommands                65536  4096 2147483647
163 NBSizedCommandsNonPostWrSzByte      65536  4096 2147483647
164 NBSizedCommandsNonPostWrSzDword      65536  4096 2147483647
165 NBSizedCommandsWrSzByte        65536  4096 2147483647
166 NBSizedCommandsWrSzDword       65536  4096 2147483647
167 NBSizedCommandsRdSzByte        65536  4096 2147483647
168 NBSizedCommandsRdSzDword       65536  4096 2147483647
169 NBSizedCommandsRdModWr         65536  4096 2147483647
170 NBProbeResult                  65536  4096 2147483647
171 NBProbeResultMiss              65536  4096 2147483647
172 NBProbeResultHit               65536  4096 2147483647
173 NBProbeResultHitDirtyWithoutMemoryCancel      65536  4096 2147483647
174 NBProbeResultHitDirtyWithMemoryCancel      65536  4096 2147483647
175 NBHyperTransportBus0Bandwidth      65536  4096 2147483647
176 NBHyperTransportBus0BandwidthCommandSent      65536  4096 2147483647
177 NBHyperTransportBus0BandwidthDataSent      65536  4096 2147483647
178 NBHyperTransportBus0BandwidthBufferReleaseSent      65536  4096 2147483647
179 NBHyperTransportBus0BandwidthNopSent      65536  4096 2147483647
180 NBHyperTransportBus1Bandwidth      65536  4096 2147483647
181 NBHyperTransportBus1BandwidthCommandSent      65536  4096 2147483647
182 NBHyperTransportBus1BandwidthDataSent      65536  4096 2147483647
183 NBHyperTransportBus1BandwidthBufferReleaseSent      65536  4096 2147483647
184 NBHyperTransportBus1BandwidthNopSent      65536  4096 2147483647
185 NBHyperTransportBus2Bandwidth      65536  4096 2147483647
186 NBHyperTransportBus2BandwidthCommandSent      65536  4096 2147483647
187 NBHyperTransportBus2BandwidthDataSent      65536  4096 2147483647
188 NBHyperTransportBus2BandwidthBufferReleaseSent      65536  4096 2147483647
189 NBHyperTransportBus2BandwidthNopSent      65536  4096 2147483647
Rekkonnect commented 3 years ago

With a Ryzen 7 1800X, I can report the same issue. It wouldn't recognize the CacheMisses perf counter because it's simply not listed as such; it's separated into DcacheMisses and IcacheMisses, and apparently that's exclusive to some AMD processors.

And having looked at the code itself, the error does not assume that the perf counter is not supported by the CPU itself.

adamsitnik commented 3 years ago

I've recently ordered a PC with AMD CPU (ThreadRipper :D) and I am supposed to get it before the 31st of March. When I do, I am going to make sure that Hardware Counters works as expected on AMD

xoofx commented 3 years ago

A bit related to this PR/issue https://github.com/dotnet/BenchmarkDotNet/pull/1438#issuecomment-620573164

Problem seems in Windows, ETW is not correctly reporting events for AMD CPU (reported here and to the feedback hub here)

xoofx commented 3 years ago

After some experiment with ETW directly, I can see that DCRefillFromL2 is generating some numbers, so it could be used for L1 cache misses while DCRefillFromSystem might be for L2 cache misses. I'm not 100% sure of that, trying to figure out how I can stabilize the numbers (fluctuating consistently quite a bit).

dn9090 commented 2 years ago

Has there been any progress on this issue? On my Ryzen 5 5600X machine I got the same error.:

The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
The counter BranchMispredictions is not available. Please make sure you are Windows 8+ without Hyper-V
AndreyAkinshin commented 1 year ago

@alexcovington @Rekkonnect @xoofx @dn9090 do you still experience this problem with the latest version of BenchmarkDotNet?

I believe that this issue should be resolved since BenchmarkDotNet v0.13.2 thanks to #2030. I just checked AMD Ryzen 9 7950X, everything works fine: the hardware counters are properly reported.

alexcovington commented 1 year ago

@AndreyAkinshin Thanks for checking in.

I don't have access to the system I was using when I originally reported the issue, but I was able to successfully read CacheMisses and InstructionRetired on Zen 3 and Zen 4 systems using BDN v0.13.5.

This seems to be resolved now, I'll go ahead and close the issue. Thanks everyone for the fix :).

xoofx commented 1 year ago

I haven't checked since then, but if it has been fixed in Windows, it should be fixed for BDN. Thanks!

nietras commented 9 months ago

I am having trouble getting hardware counters working fully on a AMD Zen 3 5950X on Windows 10 using latest BDN 0.13.12. SVM is disabled in BIOS. Hyper-V is disabled. PerfView lists counters. And tracelog is attached.

Only BranchMispredictions appears to be working. Others report as not available. Is that just how it is or is there something I can do to get these?

//    * The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter TotalCycles is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter InstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter LLCMisses is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter BranchInstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter BranchMispredictsRetired is not available. Please make sure you are Windows 8+ without Hyper-V

tracelog.txt

martindevans commented 4 months ago

I have the same issue with an AMD 7950X, Windows 10, BDN 0.13.12. tracelog output is the same as nietras.

adamsitnik commented 4 months ago

I am unable to reproduce the issue on Windows 11:

git clone https://github.com/dotnet/BenchmarkDotNet.git
cd .\BenchmarkDotNet\samples\BenchmarkDotNet.Samples\
dotnet run -c Release -f net8.0 --filter *IntroHardwareCounters*
BenchmarkDotNet v0.13.13-develop (2024-06-14), Windows 11 (10.0.22631.3737/23H2/2023Update/SunValley3)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-preview.6.24277.6
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
Method Mean Error StdDev BranchInstructions/Op BranchMispredictions/Op
SortedBranch 28.29 us 0.099 us 0.088 us 80,266 34
UnsortedBranch 138.92 us 1.463 us 1.368 us 68,450 15,946
SortedBranchless 18.84 us 0.175 us 0.164 us 32,633 17
UnsortedBranchless 18.71 us 0.105 us 0.093 us 32,637 17

BDN gets the available counters here: https://github.com/dotnet/BenchmarkDotNet/blob/6a7244d76082f098a19785e4e3b0e0f269fed004/src/BenchmarkDotNet.Diagnostics.Windows/HardwareCounters.cs#L49C40-L49C64

Which is implemented in TraceEvent here by just calling some Windows sys-calls:

https://github.com/microsoft/perfview/blob/c179d832583a7f0cced717463ee68ca6587aab49/src/TraceEvent/TraceEventSession.cs#L3153-L3221

I suspect that the issue has been fixed in Windows 11, but not in 10. It would be great if somebody with Windows 10 could debug it.

martindevans commented 4 months ago

I've been digging into this a little, I'm not very familiar with the BDN source code though.

To recap the issue, if I use the CacheMisses hardware counter I will get an error like this:

The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V

I'm experimenting with the IntroHardwareCounters sample, changing the config to [HardwareCounters(HardwareCounter.CacheMisses, HardwareCounter.BranchMispredictions)].

I found this mapping from the HardwareCounter enum to the underlying ID string. Changing { HardwareCounter.CacheMisses, "CacheMisses" } to { HardwareCounter.CacheMisses, "ICacheMisses" } seems to work. I get a final result like this:

| Method         | Mean     | Error    | StdDev   | BranchMispredictions/Op | IcacheMisses/Op |
|--------------- |---------:|---------:|---------:|------------------------:|----------------:|
| SortedBranch   | 41.29 us | 0.080 us | 0.071 us |                      12 |               1 |
| UnsortedBranch | 41.34 us | 0.099 us | 0.093 us |                      64 |               2 |

However if I change that string to "DCacheMisses" the cache column simply disappears!

| Method         | Mean     | Error    | StdDev   | BranchMispredictions/Op |
|--------------- |---------:|---------:|---------:|------------------------:|
| SortedBranch   | 41.40 us | 0.079 us | 0.070 us |                      12 |
| UnsortedBranch | 41.35 us | 0.118 us | 0.104 us |                      56 |

If I place a breakpoint into PreciseMachineCounter.OnSample it gets hits with Name="BranchMispredictions", but no hits if I remove the BranchMispredictions diagnoser. So presumably those events are never happening. Does BDN auto hide columns with no info? Or if not, under what circumstances would a column disappear?

Trying some other likely sounding names from the tracelog: If I change it to "ICMiss" I get reasonable looking results, if I change it to "DCMiss" the columns disappears again. If I change it to "DCAccess" (trying to verify if I can get any DCache related reuslts), I get this:

| Method         | Mean      | Error    | StdDev   | DCAccess/Op |
|--------------- |----------:|---------:|---------:|------------:|
| SortedBranch   |  44.38 ms | 0.246 ms | 0.230 ms | 133,142,027 |
| UnsortedBranch | 139.18 ms | 2.010 ms | 1.782 ms | 356,393,711 |

I don't really know where to go from here. Knowing when BDN auto hides columns might help me dig further.

AndreyAkinshin commented 4 months ago

@martindevans thanks for sharing your notes! BDN hides columns when there are not values to display. It seems that you have IcacheMisses on your machine, but don't have CacheMisses and DCacheMisses. Please, run the below code and share the output:

foreach (var counterName in TraceEventProfileSources.GetInfo().Keys)
    Console.WriteLine(counterName);

The csproj should have the following package reference:

<PackageReference Include="Microsoft.Diagnostics.Tracing.TraceEvent" Version="3.1.8" />
martindevans commented 4 months ago

Here's the complete output:

Timer
TotalIssues
BranchInstructions
DcacheMisses
IcacheMisses
BranchMispredictions
FpInstructions
IcacheIssues
DcacheAccesses
FPDispatchedFPUOps
FPDispatchedFPUOpsAddExcludeJunk
FPDispatchedFPUOpsMulExcludeJunk
FPDispatchedFPUOpsStoreExcludeJunk
FPDispatchedFPUOpsAddJunk
FPDispatchedFPUOpsMulJunk
FPDispatchedFPUOpsStoreJunk
FPCyclesNoFPUOpsRetired
FPDispathedFPUOpsWithFastFlag
LSSegmentRegisterLoad
LSSegmentRegisterLoadES
LSSegmentRegisterLoadCS
LSSegmentRegisterLoadSS
LSSegmentRegisterLoadDS
LSSegmentRegisterLoadFS
LSSegmentRegisterLoadGS
LSSegmentRegisterLoadHS
LSResyncBySelfModifyingCode
LSResyncBySnoop
LSBuffer2Full
LSLockedOperation
LSLateCancelOperation
LSRetiredCFLUSH
LSRetiredCPUID
DCAccess
DCMiss
DCRefillFromL2
DCRefillFromL2Invalid
DCRefillFromL2Shared
DCRefillFromL2Exclusive
DCRefillFromL2Owner
DCRefillFromL2Modified
DCRefillFromSystem
DCRefillFromSystemInvalid
DCRefillFromSystemShared
DCRefillFromSystemExclusive
DCRefillFromSystemOwner
DCRefillFromSystemModified
DCRefillCopyBack
DCRefillCopyBackInvalid
DCRefillCopyBackShared
DCRefillCopyBackExclusive
DCRefillCopyBackOwner
DCRefillCopyBackModified
DCL1DTLBMissL2DTLBHit
DCL1DTLBMissL2DTLBMiss
DCMisalignedDataReference
DCLateCancelOfAnAccess
DCEarlyCancelOfAnAccess
DCOneBitECCError
DCOneBitECCErrorScrubberError
DCOneBitECCErrorPiggybackScrubberError
DCDispatchedPrefetchInstructions
DCDispatchedPrefetchInstructionsLoad
DCDispatchedPrefetchInstructionsStore
DCDispatchedPrefetchInstructionsNTA
BUCleanToDirty
BUSharedToDirty
BUInternalL2Request
BUInternalL2RequestICFill
BUInternalL2RequestDCFill
BUInternalL2RequestTLBReload
BUInternalL2RequestTagSnoopRequest
BUInternalL2RequestCancelledRequest
BUFillRequestMissedInL2
BUFillRequestMissedInL2ICFill
BUFillRequestMissedInL2DCFill
BUFillRequestMissedInL2TLBLoad
BUFillIntoL2
BUFillIntoL2DirtyL2Victim
BUFillIntoL2VictimFromL1
ICFetch
ICMiss
ICRefillFromL2
ICRefillFromSystem
ICL1TLBMissL2TLBHit
ICL1TLBMissL2TLBMiss
ICResyncBySnoop
ICInstructionFetchStall
ICReturnStackHit
ICReturnStackOverflow
FRRetiredx86Instructions
FRRetireduops
FRRetiredBranches
FRRetiredBranchesMispredicted
FRRetiredTakenBranches
FRRetiredTakenBranchesMispredicted
FRRetiredFarControlTransfers
FRRetiredResyncsNonControlTransferBranches
FRRetiredNearReturns
FRRetiredNearReturnsMispredicted
FRRetiredTakenBranchesMispredictedByAddressMiscompare
FRRetiredFPUInstructions
FRRetiredFPUInstructionsx87
FRRetiredFPUInstructionsMMXAnd3DNow
FRRetiredFPUInstructionsPackedSSEAndSSE2
FRRetiredFPUInstructionsScalarSSEAndSSE2
FRRetiredFastpathDoubleOpInstructions
FRRetiredFastpathDoubleOpInstructionsLowOpInPosition0
FRRetiredFastpathDoubleOpInstructionsLowOpInPosition1
FRRetiredFastpathDoubleOpInstructionsLowOpInPosition2
FRInterruptsMaskedCycles
FRInterruptsMaskedWhilePendingCycles
FRTakenHardwareInterrupts
FRNothingToDispatch
FRDispatchStalls
FRDispatchStallsFromBranchAbortToRetire
FRDispatchStallsForSerialization
FRDispatchStallsForSegmentLoad
FRDispatchStallsWhenReorderBufferFull
FRDispatchStallsWhenReservationStationsFull
FRDispatchStallsWhenFPUFull
FRDispatchStallsWhenLSFull
FRDispatchStallsWhenWaitingForAllQuiet
FRDispatchStallsWhenFarControlOrResyncBranchPending
FRFPUExceptions
FRFPUExceptionsx87ReclassMicroFaults
FRFPUExceptionsSSERetypeMicroFaults
FRFPUExceptionsSSEReclassMicroFaults
FRFPUExceptionsSSEAndx87MicroTraps
FRNumberOfBreakPointsForDR0
FRNumberOfBreakPointsForDR1
FRNumberOfBreakPointsForDR2
FRNumberOfBreakPointsForDR3
NBMemoryControllerPageAccessEvent
NBMemoryControllerPageAccessEventPageHit
NBMemoryControllerPageAccessEventPageMiss
NBMemoryControllerPageAccessEventPageConflict
NBMemoryControllerPageTableOverflow
NBMemoryControllerDRAMCommandSlotsMissed
NBMemoryControllerTurnAround
NBMemoryControllerTurnAroundDIMM
NBMemoryControllerTurnAroundReadToWrite
NBMemoryControllerTurnAroundWriteToRead
NBMemoryControllerBypassCounter
NBMemoryControllerBypassCounterHighPriority
NBMemoryControllerBypassCounterLowPriority
NBMemoryControllerBypassCounterDRAMControllerInterface
NBMemoryControllerBypassCounterDRAMControllerQueue
NBSizedCommands
NBSizedCommandsNonPostWrSzByte
NBSizedCommandsNonPostWrSzDword
NBSizedCommandsWrSzByte
NBSizedCommandsWrSzDword
NBSizedCommandsRdSzByte
NBSizedCommandsRdSzDword
NBSizedCommandsRdModWr
NBProbeResult
NBProbeResultMiss
NBProbeResultHit
NBProbeResultHitDirtyWithoutMemoryCancel
NBProbeResultHitDirtyWithMemoryCancel
NBHyperTransportBus0Bandwidth
NBHyperTransportBus0BandwidthCommandSent
NBHyperTransportBus0BandwidthDataSent
NBHyperTransportBus0BandwidthBufferReleaseSent
NBHyperTransportBus0BandwidthNopSent
NBHyperTransportBus1Bandwidth
NBHyperTransportBus1BandwidthCommandSent
NBHyperTransportBus1BandwidthDataSent
NBHyperTransportBus1BandwidthBufferReleaseSent
NBHyperTransportBus1BandwidthNopSent
NBHyperTransportBus2Bandwidth
NBHyperTransportBus2BandwidthCommandSent
NBHyperTransportBus2BandwidthDataSent
NBHyperTransportBus2BandwidthBufferReleaseSent
NBHyperTransportBus2BandwidthNopSent

I just noticed the spelling in that list is IcacheMisses and not ICacheMisses (note the case). I'm going to retry my tests from above with that spelling.

martindevans commented 4 months ago

No change when using alternative casing as before - IcacheMiss works, DcacheMiss column doesn't appear.

AndreyAkinshin commented 4 months ago

I did some research and it seems that we can't safily assume existance of any specific counters in advance. We should acknowledge the fact that each combination of Windows version and hardware may support a random set of hardware counters. Therefore, I suggest the following design:

  1. We should have an API to request any custom counter by its name.
  2. Having a predefined set of counters is handy, so we should preserve it, but the list of the counters should be extended.
  3. In case when a counter is not supported, we should print a warning with an explanation and a list of available counters on the given machine (it would resolve confusion with missing columns).

The first two points were implemented in #1438, but this PR is obsolete and has some merge conflicts. Probably, the easiest option is to reuse ideas of @xoofx, but implement it from scratch on the latest master and enchance it with the third point.

@timcassell @adamsitnik what do you think?