Open timcassell opened 3 months ago
It looks like I may have resolved the zero-alloc issue!
New results (using the same benchmark from #1543):
Method | Runtime | Allocated |
---|---|---|
Benchmark1 | .NET 6.0 | 2 B |
Benchmark2 | .NET 6.0 | 2 B |
Benchmark3 | .NET 6.0 | 2 B |
Benchmark4 | .NET 6.0 | 2 B |
Benchmark5 | .NET 6.0 | 2 B |
Benchmark1 | .NET 7.0 | - |
Benchmark2 | .NET 7.0 | - |
Benchmark3 | .NET 7.0 | - |
Benchmark4 | .NET 7.0 | - |
Benchmark5 | .NET 7.0 | - |
Benchmark1 | .NET 8.0 | - |
Benchmark2 | .NET 8.0 | - |
Benchmark3 | .NET 8.0 | - |
Benchmark4 | .NET 8.0 | - |
Benchmark5 | .NET 8.0 | - |
The resolution ended up being moving the GcStats
measurement to its own method, and removing GC.Collect()
from the allocation measurement. Doing only 1 of those still wasn't getting the expected results.
I was unable to repro the issue in a plain console application, so I'm not sure why GC.Collect()
interferes with the results (I would have reported a bug in the runtime repo if I could).
PTAL @adamsitnik
I added the thread sleep from #1543 in Core runtimes 3.0 to 6.0, and now I'm getting 0 measurements across the board:
Method | Runtime | Allocated |
---|---|---|
Benchmark1 | .NET 6.0 | - |
Benchmark2 | .NET 6.0 | - |
Benchmark3 | .NET 6.0 | - |
Benchmark4 | .NET 6.0 | - |
Benchmark5 | .NET 6.0 | - |
Benchmark1 | .NET 7.0 | - |
Benchmark2 | .NET 7.0 | - |
Benchmark3 | .NET 7.0 | - |
Benchmark4 | .NET 7.0 | - |
Benchmark5 | .NET 7.0 | - |
Benchmark1 | .NET 8.0 | - |
Benchmark2 | .NET 8.0 | - |
Benchmark3 | .NET 8.0 | - |
Benchmark4 | .NET 8.0 | - |
Benchmark5 | .NET 8.0 | - |
Benchmark1 | .NET Core 2.1 | - |
Benchmark2 | .NET Core 2.1 | - |
Benchmark3 | .NET Core 2.1 | - |
Benchmark4 | .NET Core 2.1 | - |
Benchmark5 | .NET Core 2.1 | - |
Benchmark1 | .NET Core 3.0 | - |
Benchmark2 | .NET Core 3.0 | - |
Benchmark3 | .NET Core 3.0 | - |
Benchmark4 | .NET Core 3.0 | - |
Benchmark5 | .NET Core 3.0 | - |
Hm, still a bit flaky on CI... I think if we can find why the GC.Collect
is causing allocations, we will finally solve the flaky measurements. Unfortunately, I don't know how to debug it. I tried using EtwProfiler, but I get an error when I try to open the results (and I'm not even sure if that's the right tool for this). Does EtwProfiler only work with Full Framework?
Okay, I tested again in a plain console app. I copied the GcStats
struct and ran the measurement exactly the same using the copy/pasted struct from 2 different assemblies (with the GC.Collect()
included in the measurement).
```cs public static void Main() { var bench = new Bench(); var action = bench.Benchmark1; Warmup(action); Console.WriteLine($"allocatedBytesFromSelf {MeasureAllocatedBytesFromSelf(action)}"); Console.WriteLine($"allocatedBytesFromBDN {MeasureAllocatedBytesFromBDN(action)}"); } [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)] static void Warmup(Action action) { for (int i = 0; i < 100; i++) { action(); } } static long? MeasureAllocatedBytesFromSelf(Action action) { // Warmup measurement. DeadCodeEliminationHelper.KeepAliveWithoutBoxing(ClassLibrary1.GcStats.ReadInitial()); DeadCodeEliminationHelper.KeepAliveWithoutBoxing(ClassLibrary1.GcStats.ReadFinal()); Thread.Sleep(500); GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); return MeasureSelf(action).GetTotalAllocatedBytes(false); } [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)] static ClassLibrary1.GcStats MeasureSelf(Action action) { var initial = ClassLibrary1.GcStats.ReadInitial(); action(); var final = ClassLibrary1.GcStats.ReadFinal(); return final - initial; } static long? MeasureAllocatedBytesFromBDN(Action action) { // Warmup measurement. DeadCodeEliminationHelper.KeepAliveWithoutBoxing(BenchmarkDotNet.Engines.GcStats.ReadInitial()); DeadCodeEliminationHelper.KeepAliveWithoutBoxing(BenchmarkDotNet.Engines.GcStats.ReadFinal()); Thread.Sleep(500); GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); return MeasureBDN(action).GetTotalAllocatedBytes(false); } [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)] static GcStats MeasureBDN(Action action) { var initial = BenchmarkDotNet.Engines.GcStats.ReadInitial(); action(); var final = BenchmarkDotNet.Engines.GcStats.ReadFinal(); return final - initial; } ```
This prints
allocatedBytesFromSelf 0
allocatedBytesFromBDN 336
There must be something else in the BenchmarkDotNet
assembly causing those allocations. But I can't find what it is.
I can only repro in net8.0. net48 measures 0
for both.
[Edit] I even tried Visual Studio's Performance Profiler, and it's not showing me anything. 😞
[Edit2] I didn't copy RuntimeInformation
before. I also copied that and then I saw the same behavior. Still not sure how that's causing allocations, though.
Update: I found the culprit and opened an issue in dotnet/runtime. https://github.com/dotnet/runtime/issues/101536
I disabled the flaky tests and left a todo to re-enable them when that issue is fixed.
This is as accurate as we can make the memory diagnoser from our side. I think this should be good to merge now, unless you have any further feedback @adamsitnik.
Ok, I re-enabled the tests and added a finalizer thread blocker to make the tests less flaky (nice idea @jkotas). This does not have any effect on real world measurements, though.
Wow, even with tiered jit disabled and finalizer thread blocked, the CI is still flaky.
2024-04-28T00:50:55.0136465Z Failed AllocationQuantumIsNotAnIssueForNetCore21Plus(toolchain: .NET 8.0) [9 s]
2024-04-28T00:50:55.0137358Z Error Message:
2024-04-28T00:50:55.0137773Z Assert.Equal() Failure: Values differ
2024-04-28T00:50:55.0138293Z Expected: 88
2024-04-28T00:50:55.0138634Z Actual: 176
I don't know what causes that behavior, but I just moved the MemoryDiagnoserTests to ManualRunning to stabilize the CI.
Wow, even with tiered jit disabled and finalizer thread blocked, the CI is still flaky.
My guess is that the next problem are the event source background threads.
My guess is that the next problem are the event source background threads.
Can you provide more details? I couldn't find information about it from google.
EventSource and EventPipe create background threads that may allocate. These background threads should be only doing work if there is something listening to the events. It turns out that there is pretty much always something listening to .NET runtime events in typical machine configs these days.
You mentioned disabling that in the other issue. How can we do that?
Try to add this to your .csproj:
<ItemGroup>
<RuntimeHostConfigurationOption Include="System.Diagnostics.Tracing.EventSource.IsSupported" Value="false" />
</ItemGroup>
Well, that didn't seem to help. The test failed exactly the same.
Does the test harness create any child processes to run the actual tests? If yes, does this setting propagate to the child processes?
Does the test harness create any child processes to run the actual tests? If yes, does this setting propagate to the child processes?
Yes and yes, RuntimeHostConfigurationOption
is one of the settings we explicitly copy.
I also verified it by adding this before the allocation measurement
var switchExists = AppContext.TryGetSwitch("System.Diagnostics.Tracing.EventSource.IsSupported", out var isEnabled);
Host.WriteLine($"// EventSource switchExists: {switchExists}, isEnabled: {isEnabled}");
and it printed // EventSource switchExists: True, isEnabled: False
.
@adamsitnik Do you have any more feedback here?
This doesn't fix the zero-alloc measurement, but it does fix a bug from the threading diagnoser interfering with the results. The other changes give us the highest confidence that any errant measurements are not our fault. The rest is up to the runtime.Fixes #1542