Open alexcovington opened 4 years ago
Hi @alexcovington
BenchmarkDotNet is using TraceEvent
which internally uses ETW to get hardware counters information. I am afraid that there is an AMD-specific bug somewhere.
Could you please run the following command and share your output here?
tracelog.exe -profilesources Help
tracelog.exe might not be present in your $PATH so you can use Visual Studio Command Prompt to run this command:
Hi @adamsitnik, sorry for the wait on this. With COVID, I don't have physical access to this machine all the time, so I wasn't able to work on this till now.
Here's the output from tracelog running in admin Developer Command Prompt for VS 2019:
C:\Windows\System32>tracelog -profilesources Help
Id Name Interval Min Max
--------------------------------------------------------------
0 Timer 10000 1221 1000000
2 TotalIssues 65536 4096 2147483647
6 BranchInstructions 65536 4096 2147483647
8 DcacheMisses 65536 4096 2147483647
9 IcacheMisses 65536 4096 2147483647
11 BranchMispredictions 65536 4096 2147483647
13 FpInstructions 65536 4096 2147483647
20 IcacheIssues 65536 4096 2147483647
21 DcacheAccesses 65536 4096 2147483647
25 FPDispatchedFPUOps 65536 4096 2147483647
26 FPDispatchedFPUOpsAddExcludeJunk 65536 4096 2147483647
27 FPDispatchedFPUOpsMulExcludeJunk 65536 4096 2147483647
28 FPDispatchedFPUOpsStoreExcludeJunk 65536 4096 2147483647
29 FPDispatchedFPUOpsAddJunk 65536 4096 2147483647
30 FPDispatchedFPUOpsMulJunk 65536 4096 2147483647
31 FPDispatchedFPUOpsStoreJunk 65536 4096 2147483647
32 FPCyclesNoFPUOpsRetired 65536 4096 2147483647
33 FPDispathedFPUOpsWithFastFlag 65536 4096 2147483647
34 LSSegmentRegisterLoad 65536 4096 2147483647
35 LSSegmentRegisterLoadES 65536 4096 2147483647
36 LSSegmentRegisterLoadCS 65536 4096 2147483647
37 LSSegmentRegisterLoadSS 65536 4096 2147483647
38 LSSegmentRegisterLoadDS 65536 4096 2147483647
39 LSSegmentRegisterLoadFS 65536 4096 2147483647
40 LSSegmentRegisterLoadGS 65536 4096 2147483647
41 LSSegmentRegisterLoadHS 65536 4096 2147483647
42 LSResyncBySelfModifyingCode 65536 4096 2147483647
43 LSResyncBySnoop 65536 4096 2147483647
44 LSBuffer2Full 65536 4096 2147483647
45 LSLockedOperation 65536 4096 2147483647
46 LSLateCancelOperation 65536 4096 2147483647
47 LSRetiredCFLUSH 65536 4096 2147483647
48 LSRetiredCPUID 65536 4096 2147483647
49 DCAccess 65536 4096 2147483647
50 DCMiss 65536 4096 2147483647
51 DCRefillFromL2 65536 4096 2147483647
52 DCRefillFromL2Invalid 65536 4096 2147483647
53 DCRefillFromL2Shared 65536 4096 2147483647
54 DCRefillFromL2Exclusive 65536 4096 2147483647
55 DCRefillFromL2Owner 65536 4096 2147483647
56 DCRefillFromL2Modified 65536 4096 2147483647
57 DCRefillFromSystem 65536 4096 2147483647
58 DCRefillFromSystemInvalid 65536 4096 2147483647
59 DCRefillFromSystemShared 65536 4096 2147483647
60 DCRefillFromSystemExclusive 65536 4096 2147483647
61 DCRefillFromSystemOwner 65536 4096 2147483647
62 DCRefillFromSystemModified 65536 4096 2147483647
63 DCRefillCopyBack 65536 4096 2147483647
64 DCRefillCopyBackInvalid 65536 4096 2147483647
65 DCRefillCopyBackShared 65536 4096 2147483647
66 DCRefillCopyBackExclusive 65536 4096 2147483647
67 DCRefillCopyBackOwner 65536 4096 2147483647
68 DCRefillCopyBackModified 65536 4096 2147483647
69 DCL1DTLBMissL2DTLBHit 65536 4096 2147483647
70 DCL1DTLBMissL2DTLBMiss 65536 4096 2147483647
71 DCMisalignedDataReference 65536 4096 2147483647
72 DCLateCancelOfAnAccess 65536 4096 2147483647
73 DCEarlyCancelOfAnAccess 65536 4096 2147483647
74 DCOneBitECCError 65536 4096 2147483647
75 DCOneBitECCErrorScrubberError 65536 4096 2147483647
76 DCOneBitECCErrorPiggybackScrubberError 65536 4096 2147483647
77 DCDispatchedPrefetchInstructions 65536 4096 2147483647
78 DCDispatchedPrefetchInstructionsLoad 65536 4096 2147483647
79 DCDispatchedPrefetchInstructionsStore 65536 4096 2147483647
80 DCDispatchedPrefetchInstructionsNTA 65536 4096 2147483647
190 BUCleanToDirty 65536 4096 2147483647
191 BUSharedToDirty 65536 4096 2147483647
81 BUInternalL2Request 65536 4096 2147483647
82 BUInternalL2RequestICFill 65536 4096 2147483647
83 BUInternalL2RequestDCFill 65536 4096 2147483647
84 BUInternalL2RequestTLBReload 65536 4096 2147483647
85 BUInternalL2RequestTagSnoopRequest 65536 4096 2147483647
86 BUInternalL2RequestCancelledRequest 65536 4096 2147483647
87 BUFillRequestMissedInL2 65536 4096 2147483647
88 BUFillRequestMissedInL2ICFill 65536 4096 2147483647
89 BUFillRequestMissedInL2DCFill 65536 4096 2147483647
90 BUFillRequestMissedInL2TLBLoad 65536 4096 2147483647
91 BUFillIntoL2 65536 4096 2147483647
92 BUFillIntoL2DirtyL2Victim 65536 4096 2147483647
93 BUFillIntoL2VictimFromL1 65536 4096 2147483647
94 ICFetch 65536 4096 2147483647
95 ICMiss 65536 4096 2147483647
96 ICRefillFromL2 65536 4096 2147483647
97 ICRefillFromSystem 65536 4096 2147483647
98 ICL1TLBMissL2TLBHit 65536 4096 2147483647
99 ICL1TLBMissL2TLBMiss 65536 4096 2147483647
100 ICResyncBySnoop 65536 4096 2147483647
101 ICInstructionFetchStall 65536 4096 2147483647
102 ICReturnStackHit 65536 4096 2147483647
103 ICReturnStackOverflow 65536 4096 2147483647
104 FRRetiredx86Instructions 65536 4096 2147483647
105 FRRetireduops 65536 4096 2147483647
106 FRRetiredBranches 65536 4096 2147483647
107 FRRetiredBranchesMispredicted 65536 4096 2147483647
108 FRRetiredTakenBranches 65536 4096 2147483647
109 FRRetiredTakenBranchesMispredicted 65536 4096 2147483647
110 FRRetiredFarControlTransfers 65536 4096 2147483647
111 FRRetiredResyncsNonControlTransferBranches 65536 4096 2147483647
112 FRRetiredNearReturns 65536 4096 2147483647
113 FRRetiredNearReturnsMispredicted 65536 4096 2147483647
114 FRRetiredTakenBranchesMispredictedByAddressMiscompare 65536 4096 2147483647
115 FRRetiredFPUInstructions 65536 4096 2147483647
116 FRRetiredFPUInstructionsx87 65536 4096 2147483647
117 FRRetiredFPUInstructionsMMXAnd3DNow 65536 4096 2147483647
118 FRRetiredFPUInstructionsPackedSSEAndSSE2 65536 4096 2147483647
119 FRRetiredFPUInstructionsScalarSSEAndSSE2 65536 4096 2147483647
120 FRRetiredFastpathDoubleOpInstructions 65536 4096 2147483647
121 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition0 65536 4096 2147483647
122 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition1 65536 4096 2147483647
123 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition2 65536 4096 2147483647
124 FRInterruptsMaskedCycles 65536 4096 2147483647
125 FRInterruptsMaskedWhilePendingCycles 65536 4096 2147483647
126 FRTakenHardwareInterrupts 65536 4096 2147483647
127 FRNothingToDispatch 65536 4096 2147483647
128 FRDispatchStalls 65536 4096 2147483647
129 FRDispatchStallsFromBranchAbortToRetire 65536 4096 2147483647
130 FRDispatchStallsForSerialization 65536 4096 2147483647
131 FRDispatchStallsForSegmentLoad 65536 4096 2147483647
132 FRDispatchStallsWhenReorderBufferFull 65536 4096 2147483647
133 FRDispatchStallsWhenReservationStationsFull 65536 4096 2147483647
134 FRDispatchStallsWhenFPUFull 65536 4096 2147483647
135 FRDispatchStallsWhenLSFull 65536 4096 2147483647
136 FRDispatchStallsWhenWaitingForAllQuiet 65536 4096 2147483647
137 FRDispatchStallsWhenFarControlOrResyncBranchPending 65536 4096 2147483647
138 FRFPUExceptions 65536 4096 2147483647
139 FRFPUExceptionsx87ReclassMicroFaults 65536 4096 2147483647
140 FRFPUExceptionsSSERetypeMicroFaults 65536 4096 2147483647
141 FRFPUExceptionsSSEReclassMicroFaults 65536 4096 2147483647
142 FRFPUExceptionsSSEAndx87MicroTraps 65536 4096 2147483647
143 FRNumberOfBreakPointsForDR0 65536 4096 2147483647
144 FRNumberOfBreakPointsForDR1 65536 4096 2147483647
145 FRNumberOfBreakPointsForDR2 65536 4096 2147483647
146 FRNumberOfBreakPointsForDR3 65536 4096 2147483647
147 NBMemoryControllerPageAccessEvent 65536 4096 2147483647
148 NBMemoryControllerPageAccessEventPageHit 65536 4096 2147483647
149 NBMemoryControllerPageAccessEventPageMiss 65536 4096 2147483647
150 NBMemoryControllerPageAccessEventPageConflict 65536 4096 2147483647
151 NBMemoryControllerPageTableOverflow 65536 4096 2147483647
152 NBMemoryControllerDRAMCommandSlotsMissed 65536 4096 2147483647
153 NBMemoryControllerTurnAround 65536 4096 2147483647
154 NBMemoryControllerTurnAroundDIMM 65536 4096 2147483647
155 NBMemoryControllerTurnAroundReadToWrite 65536 4096 2147483647
156 NBMemoryControllerTurnAroundWriteToRead 65536 4096 2147483647
157 NBMemoryControllerBypassCounter 65536 4096 2147483647
158 NBMemoryControllerBypassCounterHighPriority 65536 4096 2147483647
159 NBMemoryControllerBypassCounterLowPriority 65536 4096 2147483647
160 NBMemoryControllerBypassCounterDRAMControllerInterface 65536 4096 2147483647
161 NBMemoryControllerBypassCounterDRAMControllerQueue 65536 4096 2147483647
162 NBSizedCommands 65536 4096 2147483647
163 NBSizedCommandsNonPostWrSzByte 65536 4096 2147483647
164 NBSizedCommandsNonPostWrSzDword 65536 4096 2147483647
165 NBSizedCommandsWrSzByte 65536 4096 2147483647
166 NBSizedCommandsWrSzDword 65536 4096 2147483647
167 NBSizedCommandsRdSzByte 65536 4096 2147483647
168 NBSizedCommandsRdSzDword 65536 4096 2147483647
169 NBSizedCommandsRdModWr 65536 4096 2147483647
170 NBProbeResult 65536 4096 2147483647
171 NBProbeResultMiss 65536 4096 2147483647
172 NBProbeResultHit 65536 4096 2147483647
173 NBProbeResultHitDirtyWithoutMemoryCancel 65536 4096 2147483647
174 NBProbeResultHitDirtyWithMemoryCancel 65536 4096 2147483647
175 NBHyperTransportBus0Bandwidth 65536 4096 2147483647
176 NBHyperTransportBus0BandwidthCommandSent 65536 4096 2147483647
177 NBHyperTransportBus0BandwidthDataSent 65536 4096 2147483647
178 NBHyperTransportBus0BandwidthBufferReleaseSent 65536 4096 2147483647
179 NBHyperTransportBus0BandwidthNopSent 65536 4096 2147483647
180 NBHyperTransportBus1Bandwidth 65536 4096 2147483647
181 NBHyperTransportBus1BandwidthCommandSent 65536 4096 2147483647
182 NBHyperTransportBus1BandwidthDataSent 65536 4096 2147483647
183 NBHyperTransportBus1BandwidthBufferReleaseSent 65536 4096 2147483647
184 NBHyperTransportBus1BandwidthNopSent 65536 4096 2147483647
185 NBHyperTransportBus2Bandwidth 65536 4096 2147483647
186 NBHyperTransportBus2BandwidthCommandSent 65536 4096 2147483647
187 NBHyperTransportBus2BandwidthDataSent 65536 4096 2147483647
188 NBHyperTransportBus2BandwidthBufferReleaseSent 65536 4096 2147483647
189 NBHyperTransportBus2BandwidthNopSent 65536 4096 2147483647
With a Ryzen 7 1800X, I can report the same issue. It wouldn't recognize the CacheMisses perf counter because it's simply not listed as such; it's separated into DcacheMisses
and IcacheMisses
, and apparently that's exclusive to some AMD processors.
And having looked at the code itself, the error does not assume that the perf counter is not supported by the CPU itself.
I've recently ordered a PC with AMD CPU (ThreadRipper :D) and I am supposed to get it before the 31st of March. When I do, I am going to make sure that Hardware Counters works as expected on AMD
A bit related to this PR/issue https://github.com/dotnet/BenchmarkDotNet/pull/1438#issuecomment-620573164
Problem seems in Windows, ETW is not correctly reporting events for AMD CPU (reported here and to the feedback hub here)
After some experiment with ETW directly, I can see that DCRefillFromL2
is generating some numbers, so it could be used for L1 cache misses while DCRefillFromSystem
might be for L2 cache misses. I'm not 100% sure of that, trying to figure out how I can stabilize the numbers (fluctuating consistently quite a bit).
Has there been any progress on this issue? On my Ryzen 5 5600X machine I got the same error.:
The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
The counter BranchMispredictions is not available. Please make sure you are Windows 8+ without Hyper-V
@alexcovington @Rekkonnect @xoofx @dn9090 do you still experience this problem with the latest version of BenchmarkDotNet?
I believe that this issue should be resolved since BenchmarkDotNet v0.13.2 thanks to #2030. I just checked AMD Ryzen 9 7950X, everything works fine: the hardware counters are properly reported.
@AndreyAkinshin Thanks for checking in.
I don't have access to the system I was using when I originally reported the issue, but I was able to successfully read CacheMisses and InstructionRetired on Zen 3 and Zen 4 systems using BDN v0.13.5.
This seems to be resolved now, I'll go ahead and close the issue. Thanks everyone for the fix :).
I haven't checked since then, but if it has been fixed in Windows, it should be fixed for BDN. Thanks!
I am having trouble getting hardware counters working fully on a AMD Zen 3 5950X on Windows 10 using latest BDN 0.13.12. SVM is disabled in BIOS. Hyper-V is disabled. PerfView lists counters. And tracelog is attached.
Only BranchMispredictions
appears to be working. Others report as not available. Is that just how it is or is there something I can do to get these?
// * The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
// * The counter TotalCycles is not available. Please make sure you are Windows 8+ without Hyper-V
// * The counter InstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V
// * The counter LLCMisses is not available. Please make sure you are Windows 8+ without Hyper-V
// * The counter BranchInstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V
// * The counter BranchMispredictsRetired is not available. Please make sure you are Windows 8+ without Hyper-V
I have the same issue with an AMD 7950X, Windows 10, BDN 0.13.12. tracelog output is the same as nietras.
I am unable to reproduce the issue on Windows 11:
git clone https://github.com/dotnet/BenchmarkDotNet.git
cd .\BenchmarkDotNet\samples\BenchmarkDotNet.Samples\
dotnet run -c Release -f net8.0 --filter *IntroHardwareCounters*
BenchmarkDotNet v0.13.13-develop (2024-06-14), Windows 11 (10.0.22631.3737/23H2/2023Update/SunValley3)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-preview.6.24277.6
[Host] : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
Method | Mean | Error | StdDev | BranchInstructions/Op | BranchMispredictions/Op |
---|---|---|---|---|---|
SortedBranch | 28.29 us | 0.099 us | 0.088 us | 80,266 | 34 |
UnsortedBranch | 138.92 us | 1.463 us | 1.368 us | 68,450 | 15,946 |
SortedBranchless | 18.84 us | 0.175 us | 0.164 us | 32,633 | 17 |
UnsortedBranchless | 18.71 us | 0.105 us | 0.093 us | 32,637 | 17 |
BDN gets the available counters here: https://github.com/dotnet/BenchmarkDotNet/blob/6a7244d76082f098a19785e4e3b0e0f269fed004/src/BenchmarkDotNet.Diagnostics.Windows/HardwareCounters.cs#L49C40-L49C64
Which is implemented in TraceEvent here by just calling some Windows sys-calls:
I suspect that the issue has been fixed in Windows 11, but not in 10. It would be great if somebody with Windows 10 could debug it.
I've been digging into this a little, I'm not very familiar with the BDN source code though.
To recap the issue, if I use the CacheMisses
hardware counter I will get an error like this:
The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
I'm experimenting with the IntroHardwareCounters
sample, changing the config to [HardwareCounters(HardwareCounter.CacheMisses, HardwareCounter.BranchMispredictions)]
.
I found this mapping from the HardwareCounter enum to the underlying ID string. Changing { HardwareCounter.CacheMisses, "CacheMisses" }
to { HardwareCounter.CacheMisses, "ICacheMisses" }
seems to work. I get a final result like this:
| Method | Mean | Error | StdDev | BranchMispredictions/Op | IcacheMisses/Op |
|--------------- |---------:|---------:|---------:|------------------------:|----------------:|
| SortedBranch | 41.29 us | 0.080 us | 0.071 us | 12 | 1 |
| UnsortedBranch | 41.34 us | 0.099 us | 0.093 us | 64 | 2 |
However if I change that string to "DCacheMisses"
the cache column simply disappears!
| Method | Mean | Error | StdDev | BranchMispredictions/Op |
|--------------- |---------:|---------:|---------:|------------------------:|
| SortedBranch | 41.40 us | 0.079 us | 0.070 us | 12 |
| UnsortedBranch | 41.35 us | 0.118 us | 0.104 us | 56 |
If I place a breakpoint into PreciseMachineCounter.OnSample
it gets hits with Name="BranchMispredictions"
, but no hits if I remove the BranchMispredictions diagnoser. So presumably those events are never happening. Does BDN auto hide columns with no info? Or if not, under what circumstances would a column disappear?
Trying some other likely sounding names from the tracelog: If I change it to "ICMiss"
I get reasonable looking results, if I change it to "DCMiss"
the columns disappears again. If I change it to "DCAccess"
(trying to verify if I can get any DCache related reuslts), I get this:
| Method | Mean | Error | StdDev | DCAccess/Op |
|--------------- |----------:|---------:|---------:|------------:|
| SortedBranch | 44.38 ms | 0.246 ms | 0.230 ms | 133,142,027 |
| UnsortedBranch | 139.18 ms | 2.010 ms | 1.782 ms | 356,393,711 |
I don't really know where to go from here. Knowing when BDN auto hides columns might help me dig further.
@martindevans thanks for sharing your notes! BDN hides columns when there are not values to display. It seems that you have IcacheMisses
on your machine, but don't have CacheMisses
and DCacheMisses
. Please, run the below code and share the output:
foreach (var counterName in TraceEventProfileSources.GetInfo().Keys)
Console.WriteLine(counterName);
The csproj should have the following package reference:
<PackageReference Include="Microsoft.Diagnostics.Tracing.TraceEvent" Version="3.1.8" />
Here's the complete output:
Timer
TotalIssues
BranchInstructions
DcacheMisses
IcacheMisses
BranchMispredictions
FpInstructions
IcacheIssues
DcacheAccesses
FPDispatchedFPUOps
FPDispatchedFPUOpsAddExcludeJunk
FPDispatchedFPUOpsMulExcludeJunk
FPDispatchedFPUOpsStoreExcludeJunk
FPDispatchedFPUOpsAddJunk
FPDispatchedFPUOpsMulJunk
FPDispatchedFPUOpsStoreJunk
FPCyclesNoFPUOpsRetired
FPDispathedFPUOpsWithFastFlag
LSSegmentRegisterLoad
LSSegmentRegisterLoadES
LSSegmentRegisterLoadCS
LSSegmentRegisterLoadSS
LSSegmentRegisterLoadDS
LSSegmentRegisterLoadFS
LSSegmentRegisterLoadGS
LSSegmentRegisterLoadHS
LSResyncBySelfModifyingCode
LSResyncBySnoop
LSBuffer2Full
LSLockedOperation
LSLateCancelOperation
LSRetiredCFLUSH
LSRetiredCPUID
DCAccess
DCMiss
DCRefillFromL2
DCRefillFromL2Invalid
DCRefillFromL2Shared
DCRefillFromL2Exclusive
DCRefillFromL2Owner
DCRefillFromL2Modified
DCRefillFromSystem
DCRefillFromSystemInvalid
DCRefillFromSystemShared
DCRefillFromSystemExclusive
DCRefillFromSystemOwner
DCRefillFromSystemModified
DCRefillCopyBack
DCRefillCopyBackInvalid
DCRefillCopyBackShared
DCRefillCopyBackExclusive
DCRefillCopyBackOwner
DCRefillCopyBackModified
DCL1DTLBMissL2DTLBHit
DCL1DTLBMissL2DTLBMiss
DCMisalignedDataReference
DCLateCancelOfAnAccess
DCEarlyCancelOfAnAccess
DCOneBitECCError
DCOneBitECCErrorScrubberError
DCOneBitECCErrorPiggybackScrubberError
DCDispatchedPrefetchInstructions
DCDispatchedPrefetchInstructionsLoad
DCDispatchedPrefetchInstructionsStore
DCDispatchedPrefetchInstructionsNTA
BUCleanToDirty
BUSharedToDirty
BUInternalL2Request
BUInternalL2RequestICFill
BUInternalL2RequestDCFill
BUInternalL2RequestTLBReload
BUInternalL2RequestTagSnoopRequest
BUInternalL2RequestCancelledRequest
BUFillRequestMissedInL2
BUFillRequestMissedInL2ICFill
BUFillRequestMissedInL2DCFill
BUFillRequestMissedInL2TLBLoad
BUFillIntoL2
BUFillIntoL2DirtyL2Victim
BUFillIntoL2VictimFromL1
ICFetch
ICMiss
ICRefillFromL2
ICRefillFromSystem
ICL1TLBMissL2TLBHit
ICL1TLBMissL2TLBMiss
ICResyncBySnoop
ICInstructionFetchStall
ICReturnStackHit
ICReturnStackOverflow
FRRetiredx86Instructions
FRRetireduops
FRRetiredBranches
FRRetiredBranchesMispredicted
FRRetiredTakenBranches
FRRetiredTakenBranchesMispredicted
FRRetiredFarControlTransfers
FRRetiredResyncsNonControlTransferBranches
FRRetiredNearReturns
FRRetiredNearReturnsMispredicted
FRRetiredTakenBranchesMispredictedByAddressMiscompare
FRRetiredFPUInstructions
FRRetiredFPUInstructionsx87
FRRetiredFPUInstructionsMMXAnd3DNow
FRRetiredFPUInstructionsPackedSSEAndSSE2
FRRetiredFPUInstructionsScalarSSEAndSSE2
FRRetiredFastpathDoubleOpInstructions
FRRetiredFastpathDoubleOpInstructionsLowOpInPosition0
FRRetiredFastpathDoubleOpInstructionsLowOpInPosition1
FRRetiredFastpathDoubleOpInstructionsLowOpInPosition2
FRInterruptsMaskedCycles
FRInterruptsMaskedWhilePendingCycles
FRTakenHardwareInterrupts
FRNothingToDispatch
FRDispatchStalls
FRDispatchStallsFromBranchAbortToRetire
FRDispatchStallsForSerialization
FRDispatchStallsForSegmentLoad
FRDispatchStallsWhenReorderBufferFull
FRDispatchStallsWhenReservationStationsFull
FRDispatchStallsWhenFPUFull
FRDispatchStallsWhenLSFull
FRDispatchStallsWhenWaitingForAllQuiet
FRDispatchStallsWhenFarControlOrResyncBranchPending
FRFPUExceptions
FRFPUExceptionsx87ReclassMicroFaults
FRFPUExceptionsSSERetypeMicroFaults
FRFPUExceptionsSSEReclassMicroFaults
FRFPUExceptionsSSEAndx87MicroTraps
FRNumberOfBreakPointsForDR0
FRNumberOfBreakPointsForDR1
FRNumberOfBreakPointsForDR2
FRNumberOfBreakPointsForDR3
NBMemoryControllerPageAccessEvent
NBMemoryControllerPageAccessEventPageHit
NBMemoryControllerPageAccessEventPageMiss
NBMemoryControllerPageAccessEventPageConflict
NBMemoryControllerPageTableOverflow
NBMemoryControllerDRAMCommandSlotsMissed
NBMemoryControllerTurnAround
NBMemoryControllerTurnAroundDIMM
NBMemoryControllerTurnAroundReadToWrite
NBMemoryControllerTurnAroundWriteToRead
NBMemoryControllerBypassCounter
NBMemoryControllerBypassCounterHighPriority
NBMemoryControllerBypassCounterLowPriority
NBMemoryControllerBypassCounterDRAMControllerInterface
NBMemoryControllerBypassCounterDRAMControllerQueue
NBSizedCommands
NBSizedCommandsNonPostWrSzByte
NBSizedCommandsNonPostWrSzDword
NBSizedCommandsWrSzByte
NBSizedCommandsWrSzDword
NBSizedCommandsRdSzByte
NBSizedCommandsRdSzDword
NBSizedCommandsRdModWr
NBProbeResult
NBProbeResultMiss
NBProbeResultHit
NBProbeResultHitDirtyWithoutMemoryCancel
NBProbeResultHitDirtyWithMemoryCancel
NBHyperTransportBus0Bandwidth
NBHyperTransportBus0BandwidthCommandSent
NBHyperTransportBus0BandwidthDataSent
NBHyperTransportBus0BandwidthBufferReleaseSent
NBHyperTransportBus0BandwidthNopSent
NBHyperTransportBus1Bandwidth
NBHyperTransportBus1BandwidthCommandSent
NBHyperTransportBus1BandwidthDataSent
NBHyperTransportBus1BandwidthBufferReleaseSent
NBHyperTransportBus1BandwidthNopSent
NBHyperTransportBus2Bandwidth
NBHyperTransportBus2BandwidthCommandSent
NBHyperTransportBus2BandwidthDataSent
NBHyperTransportBus2BandwidthBufferReleaseSent
NBHyperTransportBus2BandwidthNopSent
I just noticed the spelling in that list is IcacheMisses
and not ICacheMisses
(note the case). I'm going to retry my tests from above with that spelling.
No change when using alternative casing as before - IcacheMiss works, DcacheMiss column doesn't appear.
I did some research and it seems that we can't safily assume existance of any specific counters in advance. We should acknowledge the fact that each combination of Windows version and hardware may support a random set of hardware counters. Therefore, I suggest the following design:
The first two points were implemented in #1438, but this PR is obsolete and has some merge conflicts. Probably, the easiest option is to reuse ideas of @xoofx, but implement it from scratch on the latest master and enchance it with the third point.
@timcassell @adamsitnik what do you think?
Having trouble reading 'CacheMisses' and 'InstructionsRetired' whenever I run the dotnet/performance benchmark suite. If I run the command from within the
src\benchmarks\micro
directory:I get the following error message:
I am able to get some counters to read by enabling IBS in my BIOS (technically wasn't called IBS, I had to disable SVM to get it to work). So the following command does work for me:
If I profile a C# application outside of BenchmarkDotNet using AMD uProf, I can get cache miss and instructions retired statistics. I am also able to read all of the counters on my Skylake machine using the BenchmarkDotNet CLI without any issues.
Would appreciate any help that can be provided!