dotnet / BenchmarkDotNet

Powerful .NET library for benchmarking
https://benchmarkdotnet.org
MIT License
10.24k stars 952 forks source link

Improve memory diagnoser accuracy #2562

Open timcassell opened 3 months ago

timcassell commented 3 months ago

This doesn't fix the zero-alloc measurement, but it does fix a bug from the threading diagnoser interfering with the results. The other changes give us the highest confidence that any errant measurements are not our fault. The rest is up to the runtime.

Fixes #1542

timcassell commented 3 months ago

It looks like I may have resolved the zero-alloc issue!

New results (using the same benchmark from #1543):

Method Runtime Allocated
Benchmark1 .NET 6.0 2 B
Benchmark2 .NET 6.0 2 B
Benchmark3 .NET 6.0 2 B
Benchmark4 .NET 6.0 2 B
Benchmark5 .NET 6.0 2 B
Benchmark1 .NET 7.0 -
Benchmark2 .NET 7.0 -
Benchmark3 .NET 7.0 -
Benchmark4 .NET 7.0 -
Benchmark5 .NET 7.0 -
Benchmark1 .NET 8.0 -
Benchmark2 .NET 8.0 -
Benchmark3 .NET 8.0 -
Benchmark4 .NET 8.0 -
Benchmark5 .NET 8.0 -

The resolution ended up being moving the GcStats measurement to its own method, and removing GC.Collect() from the allocation measurement. Doing only 1 of those still wasn't getting the expected results.

I was unable to repro the issue in a plain console application, so I'm not sure why GC.Collect() interferes with the results (I would have reported a bug in the runtime repo if I could).

PTAL @adamsitnik

timcassell commented 2 months ago

I added the thread sleep from #1543 in Core runtimes 3.0 to 6.0, and now I'm getting 0 measurements across the board:

Method Runtime Allocated
Benchmark1 .NET 6.0 -
Benchmark2 .NET 6.0 -
Benchmark3 .NET 6.0 -
Benchmark4 .NET 6.0 -
Benchmark5 .NET 6.0 -
Benchmark1 .NET 7.0 -
Benchmark2 .NET 7.0 -
Benchmark3 .NET 7.0 -
Benchmark4 .NET 7.0 -
Benchmark5 .NET 7.0 -
Benchmark1 .NET 8.0 -
Benchmark2 .NET 8.0 -
Benchmark3 .NET 8.0 -
Benchmark4 .NET 8.0 -
Benchmark5 .NET 8.0 -
Benchmark1 .NET Core 2.1 -
Benchmark2 .NET Core 2.1 -
Benchmark3 .NET Core 2.1 -
Benchmark4 .NET Core 2.1 -
Benchmark5 .NET Core 2.1 -
Benchmark1 .NET Core 3.0 -
Benchmark2 .NET Core 3.0 -
Benchmark3 .NET Core 3.0 -
Benchmark4 .NET Core 3.0 -
Benchmark5 .NET Core 3.0 -
timcassell commented 2 months ago

Hm, still a bit flaky on CI... I think if we can find why the GC.Collect is causing allocations, we will finally solve the flaky measurements. Unfortunately, I don't know how to debug it. I tried using EtwProfiler, but I get an error when I try to open the results (and I'm not even sure if that's the right tool for this). Does EtwProfiler only work with Full Framework?

image

timcassell commented 2 months ago

Okay, I tested again in a plain console app. I copied the GcStats struct and ran the measurement exactly the same using the copy/pasted struct from 2 different assemblies (with the GC.Collect() included in the measurement).

Code

```cs public static void Main() { var bench = new Bench(); var action = bench.Benchmark1; Warmup(action); Console.WriteLine($"allocatedBytesFromSelf {MeasureAllocatedBytesFromSelf(action)}"); Console.WriteLine($"allocatedBytesFromBDN {MeasureAllocatedBytesFromBDN(action)}"); } [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)] static void Warmup(Action action) { for (int i = 0; i < 100; i++) { action(); } } static long? MeasureAllocatedBytesFromSelf(Action action) { // Warmup measurement. DeadCodeEliminationHelper.KeepAliveWithoutBoxing(ClassLibrary1.GcStats.ReadInitial()); DeadCodeEliminationHelper.KeepAliveWithoutBoxing(ClassLibrary1.GcStats.ReadFinal()); Thread.Sleep(500); GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); return MeasureSelf(action).GetTotalAllocatedBytes(false); } [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)] static ClassLibrary1.GcStats MeasureSelf(Action action) { var initial = ClassLibrary1.GcStats.ReadInitial(); action(); var final = ClassLibrary1.GcStats.ReadFinal(); return final - initial; } static long? MeasureAllocatedBytesFromBDN(Action action) { // Warmup measurement. DeadCodeEliminationHelper.KeepAliveWithoutBoxing(BenchmarkDotNet.Engines.GcStats.ReadInitial()); DeadCodeEliminationHelper.KeepAliveWithoutBoxing(BenchmarkDotNet.Engines.GcStats.ReadFinal()); Thread.Sleep(500); GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); return MeasureBDN(action).GetTotalAllocatedBytes(false); } [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)] static GcStats MeasureBDN(Action action) { var initial = BenchmarkDotNet.Engines.GcStats.ReadInitial(); action(); var final = BenchmarkDotNet.Engines.GcStats.ReadFinal(); return final - initial; } ```

This prints

allocatedBytesFromSelf 0
allocatedBytesFromBDN 336

There must be something else in the BenchmarkDotNet assembly causing those allocations. But I can't find what it is.

I can only repro in net8.0. net48 measures 0 for both.

[Edit] I even tried Visual Studio's Performance Profiler, and it's not showing me anything. 😞

[Edit2] I didn't copy RuntimeInformation before. I also copied that and then I saw the same behavior. Still not sure how that's causing allocations, though.

timcassell commented 2 months ago

Update: I found the culprit and opened an issue in dotnet/runtime. https://github.com/dotnet/runtime/issues/101536

timcassell commented 2 months ago

I disabled the flaky tests and left a todo to re-enable them when that issue is fixed.

This is as accurate as we can make the memory diagnoser from our side. I think this should be good to merge now, unless you have any further feedback @adamsitnik.

timcassell commented 2 months ago

Ok, I re-enabled the tests and added a finalizer thread blocker to make the tests less flaky (nice idea @jkotas). This does not have any effect on real world measurements, though.

timcassell commented 2 months ago

Wow, even with tiered jit disabled and finalizer thread blocked, the CI is still flaky.

2024-04-28T00:50:55.0136465Z   Failed AllocationQuantumIsNotAnIssueForNetCore21Plus(toolchain: .NET 8.0) [9 s]
2024-04-28T00:50:55.0137358Z   Error Message:
2024-04-28T00:50:55.0137773Z    Assert.Equal() Failure: Values differ
2024-04-28T00:50:55.0138293Z Expected: 88
2024-04-28T00:50:55.0138634Z Actual:   176

I don't know what causes that behavior, but I just moved the MemoryDiagnoserTests to ManualRunning to stabilize the CI.

jkotas commented 2 months ago

Wow, even with tiered jit disabled and finalizer thread blocked, the CI is still flaky.

My guess is that the next problem are the event source background threads.

timcassell commented 2 months ago

My guess is that the next problem are the event source background threads.

Can you provide more details? I couldn't find information about it from google.

jkotas commented 2 months ago

EventSource and EventPipe create background threads that may allocate. These background threads should be only doing work if there is something listening to the events. It turns out that there is pretty much always something listening to .NET runtime events in typical machine configs these days.

timcassell commented 2 months ago

You mentioned disabling that in the other issue. How can we do that?

jkotas commented 2 months ago

Try to add this to your .csproj:

  <ItemGroup>
    <RuntimeHostConfigurationOption Include="System.Diagnostics.Tracing.EventSource.IsSupported" Value="false" />
  </ItemGroup>
timcassell commented 2 months ago

Well, that didn't seem to help. The test failed exactly the same.

jkotas commented 2 months ago

Does the test harness create any child processes to run the actual tests? If yes, does this setting propagate to the child processes?

timcassell commented 2 months ago

Does the test harness create any child processes to run the actual tests? If yes, does this setting propagate to the child processes?

Yes and yes, RuntimeHostConfigurationOption is one of the settings we explicitly copy.

I also verified it by adding this before the allocation measurement

var switchExists = AppContext.TryGetSwitch("System.Diagnostics.Tracing.EventSource.IsSupported", out var isEnabled);
Host.WriteLine($"// EventSource switchExists: {switchExists}, isEnabled: {isEnabled}");

and it printed // EventSource switchExists: True, isEnabled: False.

timcassell commented 1 month ago

@adamsitnik Do you have any more feedback here?