Open ilexp opened 6 years ago
Also not sure if the current arithmetric implementation is the most optimal one. There is quite a big difference in the IL being generated. Take for example the vector3 + operator:
Source:
public static Vector3 operator +(Vector3 left, Vector3 right)
{
left.X += right.X;
left.Y += right.Y;
left.Z += right.Z;
return left;
}
IL:
.method public hidebysig static specialname valuetype Duality.Vector3
op_Addition(
valuetype Duality.Vector3 left,
valuetype Duality.Vector3 right
) cil managed
{
.maxstack 8
// [502 4 - 502 22]
IL_0000: ldarga.s left
IL_0002: ldflda float32 Duality.Vector3::X
IL_0007: dup
IL_0008: ldind.r4
IL_0009: ldarg.1 // right
IL_000a: ldfld float32 Duality.Vector3::X
IL_000f: add
IL_0010: stind.r4
// [503 4 - 503 22]
IL_0011: ldarga.s left
IL_0013: ldflda float32 Duality.Vector3::Y
IL_0018: dup
IL_0019: ldind.r4
IL_001a: ldarg.1 // right
IL_001b: ldfld float32 Duality.Vector3::Y
IL_0020: add
IL_0021: stind.r4
// [504 4 - 504 22]
IL_0022: ldarga.s left
IL_0024: ldflda float32 Duality.Vector3::Z
IL_0029: dup
IL_002a: ldind.r4
IL_002b: ldarg.1 // right
IL_002c: ldfld float32 Duality.Vector3::Z
IL_0031: add
IL_0032: stind.r4
// [505 4 - 505 16]
IL_0033: ldarg.0 // left
IL_0034: ret
} // end of method Vector3::op_Addition
Assembly:
push ebp
mov ebp,esp
push edi
push esi
sub esp,0Ch
lea edx,[ebp-14h]
fld dword ptr [ebp+14h]
fstp dword ptr [edx]
fld dword ptr [ebp+18h]
fstp dword ptr [edx+4]
fld dword ptr [ebp+1Ch]
fstp dword ptr [edx+8]
lea eax,[ebp-14h]
fld dword ptr [ebp+8]
fadd dword ptr [eax]
fstp dword ptr [eax]
lea eax,[ebp-10h]
fld dword ptr [ebp+0Ch]
fadd dword ptr [eax]
fstp dword ptr [eax]
lea eax,[ebp-0Ch]
fld dword ptr [ebp+10h]
fadd dword ptr [eax]
fstp dword ptr [eax]
mov edi,ecx
lea esi,[ebp-14h]
movq xmm0,mmword ptr [esi]
movq mmword ptr [edi],xmm0
add esi,8
add edi,8
movs dword ptr es:[edi],dword ptr [esi]
lea esp,[ebp-8]
pop esi
pop edi
pop ebp
ret 18h
And compare it to this:
Source:
public static Vector3 operator +(Vector3 left, Vector3 right)
{
return new Vector3(left.X + right.X, left.Y + right.Y, left.Z + right.Z);
}
IL:
.method public hidebysig static specialname valuetype Duality.Vector3
op_Addition(
valuetype Duality.Vector3 left,
valuetype Duality.Vector3 right
) cil managed
{
.maxstack 8
// [502 4 - 502 76]
IL_0000: ldarg.0 // left
IL_0001: ldfld float32 Duality.Vector3::X
IL_0006: ldarg.1 // right
IL_0007: ldfld float32 Duality.Vector3::X
IL_000c: add
IL_000d: ldarg.0 // left
IL_000e: ldfld float32 Duality.Vector3::Y
IL_0013: ldarg.1 // right
IL_0014: ldfld float32 Duality.Vector3::Y
IL_0019: add
IL_001a: ldarg.0 // left
IL_001b: ldfld float32 Duality.Vector3::Z
IL_0020: ldarg.1 // right
IL_0021: ldfld float32 Duality.Vector3::Z
IL_0026: add
IL_0027: newobj instance void Duality.Vector3::.ctor(float32, float32, float32)
IL_002c: ret
} // end of method Vector3::op_Addition
Assembly:
push ebp
mov ebp,esp
push eax
fld dword ptr [ebp+14h]
fadd dword ptr [ebp+8]
fstp dword ptr [ebp-4]
fld dword ptr [ebp-4]
fld dword ptr [ebp+18h]
fadd dword ptr [ebp+0Ch]
fstp dword ptr [ebp-4]
fld dword ptr [ebp-4]
fld dword ptr [ebp+1Ch]
fadd dword ptr [ebp+10h]
fstp dword ptr [ebp-4]
fld dword ptr [ebp-4]
fxch st(2)
fstp dword ptr [ecx]
fstp dword ptr [ecx+4]
fstp dword ptr [ecx+8]
mov esp,ebp
pop ebp
ret 18h
BenchmarkDotnet also shows that creating a new vector is alot faster than modifying the vector. Creating a small test project in duality with a stopwatch also confirms this. Might be worth looking in to and if you want to use the in keyword (readonly ref, C# 7.2) you would have to do this anyway.
Can you provide the C# code for each of the IL code snippets?
To be really sure how this ends up on the CPU, we'll also need to compare the JITted assembly code, since a lot of optimizations only happen in the JIT stage.
BenchmarkDotnet also shows that creating a new vector is alot faster than modifying the vector.
What do you mean by creating a new vector vs. modifying..? Using the in
keyword shouldn't change a lot in that regard. The result value is still a newly created vector?
Basically its this:
The current version (vector is the vector you get through the parameters)
vector.X = someCalculatedValue1;
vector.Y = someCalculatedValue2;
etc..
return vector;
vs this:
return new Vector(someCalculatedValue1, someCalculatedValue2, etc)
The last one is also compatible with the in keyword as you do not modify the vector you get in through the parameters (as in you can add the in keyword to the parameter without having to change code). It also seems that atleast in IL its alot shorter and measuring it shows its faster. Haven't checked Jitted code.
EDIT: added the source to my previous post
The last one is also compatible with the in keyword as you do not modify the vector you get in through the parameters (as in you can add the in keyword to the parameter without having to change code)
Ah, sure. For the in
keyword change that aspect of the source will have to change, yes. Could still be either a temporary vector assigned element by element, or a return + constructor combo though. The latter might be faster, though I would have assumed that it shouldn't make a difference after all optimizations fired.
Might be better memory locality since it doesnt have to go back to that vector between the calculations? Idk really I just noticed this when I looked at the IL and decided to measure it. Could be a good idea to try to reproduce it to see if that difference is there on your machine as well?
Idk really I just noticed this when I looked at the IL and decided to measure it. Could be a good idea to try to reproduce it to see if that difference is there on your machine as well?
Good call. Not sure I'll get around to benchmark this myself, but maybe you can just post the C# file with your benchmarking code and the full BenchmarkDotNet results for reference? Should be enough of a statement for when we get back to this issue later.
I have added the assembly to the source and IL post. Here is the benchmark code:
class Program
{
static void Main(string[] args)
{
BenchmarkRunner.Run<VectorOperationBench>();
Console.ReadKey();
}
}
public class VectorOperationBench
{
private Vector3 left = new Vector3(4,7,8);
private Vector3 right = new Vector3(1,2,3);
[Benchmark]
public Vector3 OldAdd()
{
return this.left + this.right;
}
[Benchmark]
public Vector3 NewAdd()
{
return new Vector3(this.left.X + this.right.X, this.left.Y + this.right.Y, this.left.Z + this.right.Z);
}
}
These are the results I get on a i7-4790k:
On my laptop its slightly slower (slower cpu) in both cases but overal the same result.
Okay, that really looks like there's an obvious perf gain from restructuring the operators that way. Very nice find 👍 If you're up for it, this could be a worthwhile master
PR until we'll get around to deal with this issue / the in
keyword stuff.
So first to be really sure this wasnt just some exception to the rule I added more benchmarks and measured the current performance:
Then I changed those operators and benchmarked them again:
So except for the division which is more or less the same its a very big improvement. I wonder why the compiler doesnt optimize this.
Also I fixed some trivial code duplication stuff (like vec float and float vec now use the same code instead of separate implementations that do the same thing for instance). Will do a PR once I did all the operators.
Also I fixed some trivial code duplication stuff (like vec float and float vec now use the same code instead of separate implementations that do the same thing for instance). Will do a PR once I did all the operators.
Are you 100% sure that you can still multiply both ways, e.g. 2.5f * vec
and vec * 2.5f
even when defining only one of them? Please check before including this. I'm not sure the order of operands is reversed implicitly by the compiler here, would honestly expect it not to be.
I didnt removed them I changed the reversed variant to use the logic of the non reversed variant in order to reduce code duplication.
I didnt removed them I changed the reversed variant to use the logic of the non reversed variant in order to reduce code duplication.
Ah, makes sense. If this is at the cost of performance, I would value performance over avoiding code duplication in this case, otherwise good call.
Ah, makes sense. If this is at the cost of performance, I would value performance over avoiding code duplication in this case, otherwise good call.
No I measured it and the performance is the same. It seems to inline them so there is no difference in performance.
Now my PR is merged aside from the much better performance when using these operators adding the in keyword is really trivial now and will require no other code changes: Current:
public static Vector3 operator -(Vector3 left, Vector3 right)
{
return new Vector3(
left.X - right.X,
left.Y - right.Y,
left.Z - right.Z);
}
New:
public static Vector3 operator -(in Vector3 left, in Vector3 right)
{
return new Vector3(
left.X - right.X,
left.Y - right.Y,
left.Z - right.Z);
}
This issue can be done as soon as we switch to C#7.2. Do remember to benchmark these to see if it improves performance. There might be a chance it actually reduces performance for smaller vectors!
There are cases where defensive copies might occur when using the in
keyword. This can reduce the performance. Making the struct readonly
will prevent this.
https://blogs.msdn.microsoft.com/seteplia/2018/03/07/the-in-modifier-and-the-readonly-structs-in-c/
However since the X
, Y
etc fields are part of the public API its basically a breaking change.
As a further enhancement one can add readonly to properties and methods etc now in C#8.0 to denote they do not change state: https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-8#readonly-members
(in C# 7.3 this was only possible on struct level)
This will enable the compiler to skip more defensive copies.
Now duality is using C# 7.3 this issue can get picked up at any time:
Making vector or matrix types readonly would be quite a usability downgrade, since that would prevent field assignments. We can still avoid performance regressions though, if we use the new readonly
members that C# 8.0 offers, to flag all non-mutating properties and methods as readonly.
With that in mind, I think it makes sense to defer this until we're done with #698 right up to C# 8.0.
Making vector or matrix types readonly would be quite a usability downgrade, since that would prevent field assignments
Agreed, one could also consider adding methods like WithX(float value)
that returns a new vector for you with a different value for X
. Basically the same that assigning the property would do but this would make it more clear when defensive copies happen and we can mark the Vector itself as readonly.
So this:
var vector = new Vector3(1,2,3);
vector.X = 6;
vector.Z = 6;
// Do something with vector
Changes to:
var vector = new Vector3(1,2,3);
var modifiedVector = vector.WithX(6)
.WithZ(6);
// Do something with modifiedVector
Slightly different meaning though but this does allow you to mark the entire vector as readonly.
I acknowledge the technical possibility, but that just reads terrible and wrong 😄
(Also, ref / out support, additional copies and potentially other stuff I'm not even thinking about - just give me plain old data vectors)
I acknowledge the technical possibility, but that just reads terrible and wrong
Hmm I don't think its that bad though. However another reason not to implement my suggestion (yup I now disagree with myself :P) might be this: https://referencesource.microsoft.com/#System.Numerics/System/Numerics/Vector3_Intrinsics.cs,3e75f0b7820a3bf5
There might be a chance that we would want to switch to these vectors at some point in the future since the System Numerics Vectors are alot faster due to stuff like SIMD optimizations. Better to try and match that public API if possible.
Yeah, I was eyeballing that possibility as well - definitely something to keep in mind.
Summary
As mentioned in issue #598, the new
in
parameter keyword is allowed on operators and using literal values, so operator implementations can supersede the previous static by-ref operator alternatives.Analysis
in
keyword implicitly with non-referencable values (like literals or property return values) has no negative impact on performance. Read the docs, and if that's not 100% clear, check out generated IL and x86 code in a sample project.ref
methods fromVector2/3/4
,Quaternion
andMatrix3/4
if they have an operator equivalent.in
keyword on parameters provided to the operators in question.