Open AndyAyersMS opened 1 year ago
@AndyAyersMS Thanks for the bug report! I can confirm that it is a severe issue that affects all the benchmarks that return structs (except IntPtr
, UIntPtr
). The minimal repro:
public struct MyStruct
{
public int Value;
}
public class Benchmarks
{
[Benchmark]
public MyStruct Foo() => new MyStruct();
}
However, it's not clear to me how to properly fix this problem. Below are some of my thoughts on this.
Firstly, we have two primary ways to consume the return value of the workload method:
consumer.Consume(workloadDelegate().Value); // Workload:1
consumer.Consume(workloadDelegate()); // Workload:2
The first one (Workload:1
) is the current default. We don't use the second one (Workload:2
) right now because it has object wrapping as a side-effect (Consume
can natively consume only primitive types, IntPtr
, UIntPtr
, object
).
Secondly, we have several ways to define the overhead delegate:
// Overhead:1
private System.Int32 __Overhead()
{
return default(System.Int32);
}
// Overhead:2
private MyStruct __Overhead()
{
return new MyStruct();
}
// Overhead:3
private MyStruct value;
private System.Int32 __Overhead()
{
return value;
}
Here are some comments about all of these options:
Overhead:1
It is the current default: it always prefers a type that can be natively consumed by Consumer
. Unfortunately, it doesn't match either Workload:1
(which has an additional .Value
accessor) or Workload:2
.Overhead:2
It matches Workload:2
, but it has a call of the MyStruct
constructor as a side effect. If Workload
doesn't create a new struct (e.g., it reads a value from a field), such an Overhead
implementation is incorrect.Overhead:3
It matches Workload:2
, but it has a field reading as a side effect. If Workload
doesn't read a field (e.g., it creates a new MyStruct
instance), such an Overhead
implementation is incorrect.Since we don't know in advance how the given benchmark obtains the struct value, it's quite hard to provide a proper Overhead
implementation that matches the Workload
implementation. Therefore, BenchmarkDotNet uses Overhead:1
to reduce the risk of mismatched implementation. Unfortunately, it leads to other issues that are presented in https://github.com/dotnet/runtime/issues/86033
At the moment, I have only one idea of how to provide a proper baseline for benchmarks that return structs:
public struct OverheadStruct
{
public int Value;
}
private OverheadStruct __Overhead()
{
return new OverheadStruct();
}
Since OverheadStruct
contains a single field, it should provide a nice baseline for struct creation (at least for structs that contain at least one field).
overheadDelegate
and overheadDelegate
we pass a field value to a consumer (whenever field consuming is applicable): public delegate OverheadStruct OverheadDelegate();
private void OverheadActionUnroll(System.Int64 invokeCount)
{
for (System.Int64 i = 0; i < invokeCount; i++)
{
consumer.Consume(overheadDelegate().Value);
public delegate System.Numerics.Vector3 WorkloadDelegate();
private void WorkloadActionUnroll(System.Int64 invokeCount)
{
for (System.Int64 i = 0; i < invokeCount; i++)
{
consumer.Consume(overheadDelegate().X);
I don't feel like it is a perfect solution, but it should (seemingly) provide a more accurate baseline for benchmarks that return structs.
@AndyAyersMS @adamsitnik What do you think?
@AndreyAkinshin What about adding a Consumer<T>
type to consume the struct as a whole? (Obviously would not work with ref struct
, but neither does the current consumer.)
it has object wrapping as a side-effect
Also, if you use Consume<T>(in T value)
, it does not box the value (nullables fixed in #2191).
@timcassell With such an approach, Consume
will spend some time copying the fields of the given value to the holder struct instance. It would be another unpleasant side effect that has to be reproduced in Overhead
.
@timcassell With such an approach,
Consume
will spend some time copying the fields of the given value to the holder struct instance. It would be another unpleasant side effect that has to be reproduced inOverhead
.
I would assume that any consume code would be replicated in overhead regardless, no?
Anyway, I was thinking of simplifying it to this.
public unsafe void Consume<T>(in T value)
=> ptrHolder = (IntPtr) Unsafe.AsPointer(ref Unsafe.AsRef(in value));
Then it won't matter how large the struct is, it's only grabbing its reference. What do you think? (current implementation passes it to DeadCodeEliminationHelper.KeepAliveWithoutBoxingReadonly
)
And I think the overhead method can be changed to this:
private MyStruct __Overhead()
{
Unsafe.SkipInit(out MyStruct value);
return value;
}
That matches the workload type, and avoids the cost of field reading and constructor call and zero-initializing.
Then it won't matter how large the struct is, it's only grabbing its reference.
Sounds good.
That matches the workload type, and avoids the cost of field reading and constructor call and zero-initializing.
It's a great idea, I like it!
Do you want to send a PR?
Do you want to send a PR?
Sure thing.
Why is a consumer even needed here? Isn't it enough to just do a NoInline method call?
Another thought is to not optimize these methods, that way the return value can be unconsumed but presumably would always still be produced. But it would make overhead higher, which is probably less desirable.
Why is a consumer even needed here? Isn't it enough to just do a NoInline method call?
I thought the same thing. There was a short discussion about that in #2173.
If these don't match then benchmark times will be either under or over estimated.
Context https://github.com/dotnet/runtime/issues/86033#issuecomment-1546306621
This is the "good" case where the jit isn't messing up the workload invoke codegen (which just makes the problem worse).
May be restricted to certain return types like Vector; I haven't seen this elsewhere.