Unexpected allocations reported in CPU-bound code

ronbrogan commented 4 years ago

I'm trying to validate that code doesn't allocate anything and I have some unit tests asserting on the summary provided from running benchmarks.

However, I'm seeing a non-deterministic amount of these test runs fail due to allocations sometimes being reported for a given benchmark - and sometimes not.

My method under test in the real world application takes about 20ms per op (CPU bound, no allocations), so my repro here has a dummy loop to simulate the work.

These benchmarks are all invoking the same method, there are multiple just to illustrate that the same code can yield different allocation results.

Source for the repro is here: https://gist.github.com/ronbrogan/bd53bddd76cfb878eef0ae0a683434df

My only line of reasoning right now is that this is due to the minimum allocation size leaking over into the measured allocations, but I still don't understand why this couldn't be avoided.

In the repro I use GC.GetAllocatedBytesForCurrentThread before/after running my method and there is no difference. Is this an issue with MemoryDiagnoser, user error, or is it simply not reasonable to try to assert that a given benchmark makes 0 allocations?

adamsitnik commented 4 years ago

Hi @ronbrogan

Big thanks for a very detailed bug report with a simple repro case!

I was able to reproduce it for .NET Core 3.1. 2.1 and 5.0 are free of this bug, I will dig deeper and get back to you

adamsitnik commented 4 years ago

Ok, this is most probably a side-effect of Tiered JIT which allocates something on the other Thread.

To test it I set the following env var: COMPlus_TieredCompilation:0

adamsitnik commented 4 years ago

I've confirmed that it's Tiered JIT background thread:

obraz

Turnerj commented 4 years ago

I noticed something like this randomly happening in my benchmarks too for a little while, thought it was something weird with my code. Transitioned some allocate-y code to use ArrayPool and it was sometimes allocating a small number of bytes and sometimes not at all - my code is otherwise CPU bound like the OP.

~Just quickly jumping through the thread in your PR @adamsitnik , is one potential "quick fix" solution to simply benchmark with the latest .NET 5? (Currently using 3.1 but in my case, can switch to just .NET 5 RC2 easy enough)~ Edit: Misread your earlier comment, thought you wrote that 3.1, 2.1 and 5.0 all had the bug.

For my own curiousity - why would there be an allocation by the JIT during the diagnoser run? I would have thought the workload and overhead JIT runs would have done everything including any allocations that they may have needed. Is it just that the tiered JIT process can happen outside of the dedicated time that BDN sets for jitting? (I have no knowledge how all that logic is done under-the-hood in BDN so I'm probably missing something obvious)

Turnerj commented 4 years ago

Just got around to running my benchmark on .NET 5, it does seem to be allocating still for me.

Code base: https://github.com/Turnerj/LevenshteinBenchmarks/tree/2475940db8c4c6f7727c20d5a3ba20a200e77e5c The specific implementation that shouldn't allocate: https://github.com/Turnerj/LevenshteinBenchmarks/blob/2475940db8c4c6f7727c20d5a3ba20a200e77e5c/Implementations/03_ArrayPool.cs

Just run the "ArrayPool" benchmark to see the results. My use of ArrayPool is well below the 1,048,576 item limit so I don't understand where the allocations are coming from besides something wrong in the diagnoser or the runtime itself.

timcassell commented 3 years ago

See issue https://github.com/dotnet/runtime/issues/45446

Even though I'm measuring differently there (total bytes instead of allocations), I think the problem is the same. It's affected in both .NET Core 3.1 and .NET 5.0 in my tests (5.0 is worse).

dotnet / BenchmarkDotNet

Unexpected allocations reported in CPU-bound code #1542