dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.48k stars 4.77k forks source link

Suboptimal code and possible loss of precision in `Stopwatch.GetElapsedTime(long, long)` #109685

Open MineCake147E opened 3 weeks ago

MineCake147E commented 3 weeks ago

Description

Stopwatch.GetElapsedTime currently uses double-precision floating-point multiplication in order to convert the units of time.

https://github.com/dotnet/runtime/blob/5db0ce0fa2e206da664498044af233a044f9aeb7/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Stopwatch.cs#L27 https://github.com/dotnet/runtime/blob/5db0ce0fa2e206da664498044af233a044f9aeb7/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Stopwatch.cs#L133-L134 This may result in a codegen that looks like this:

vzeroupper
sub rdx,rcx
vxorps    xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rdx
vmulsd    xmm0,xmm0,[7FFF0A2CABC0]
vfixupimmsd xmm0,xmm0,[7FFF0A2CABD0],0
vcmpgepd  k1,xmm0,[7FFF0A2CABE0]
vcvttsd2si rax,xmm0
vpbroadcastq xmm0,rax
vpblendmq xmm0{k1},xmm0,[7FFF0A2CABF0]
vmovq     rax,xmm0
ret

The bunch of double-precision floating-point instructions needed for conversion can be replaced with one of the following methods:

It could improve not only performance, but also precision for long durations (if we're allowed). Due to the long to double conversion, if the absolute value of the ticks is greater than $$2^{53}$$ (which is about 28.5 years), the lower bits of the ticks are unnecessarily rounded. Even if the rounding doesn't really matter for almost 100% of the applications of this API (because almost nobody wants to measure more than a decade with this API anyway), the performance improvements for trivial cases are worth doing.

Configuration

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4317/23H2/2023Update/SunValley3)
Intel Xeon w5-2455X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  DefaultJob : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Regression?

Unknown

Data


BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4317/23H2/2023Update/SunValley3)
Intel Xeon w5-2455X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  DefaultJob : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Mean Error StdDev Ratio RatioSD Code Size
IntegerAddLatency 0.2528 ns 0.0007 ns 0.0006 ns 1.00 0.00 81 B
Current 7.3734 ns 0.0592 ns 0.0554 ns 29.17 0.22 1,263 B
NoConversion 0.5060 ns 0.0018 ns 0.0014 ns 2.00 0.01 279 B
IntegerMultiply 1.2673 ns 0.0050 ns 0.0042 ns 5.01 0.02 353 B
IntegerConstantDivision 2.1699 ns 0.0054 ns 0.0048 ns 8.58 0.03 711 B
IntegerFraction 1.9243 ns 0.0052 ns 0.0049 ns 7.61 0.03 1,265 B
Benchmark Code ```csharp using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Runtime.CompilerServices; using System.Runtime.InteropServices; using System.Runtime.Intrinsics; using System.Security.Cryptography; using System.Text; using System.Threading.Tasks; using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Jobs; namespace BenchmarkPlayground { [SimpleJob(runtimeMoniker: RuntimeMoniker.HostProcess)] [DisassemblyDiagnoser(maxDepth: int.MaxValue)] public class GetElapsedTimeBenchmarks { const int OperationsPerInvoke = 1 << 20; [GlobalSetup] public void Setup() { Span k = [0, 0, 0]; RandomNumberGenerator.Fill(MemoryMarshal.AsBytes(k)); l0 = k[0]; l1 = k[1] | 1; l2 = k[2]; } long l0, l1 = 1, l2; [SkipLocalsInit] [Benchmark(Baseline = true, OperationsPerInvoke = OperationsPerInvoke)] public long IntegerAddLatency() { var v0 = l0; var v1 = l1; var v2 = l2; for (int i = 0; i < OperationsPerInvoke; i += 16) { v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v0 += v1; v1 += v2; } return v0; } [SkipLocalsInit] [Benchmark(OperationsPerInvoke = OperationsPerInvoke)] public long Current() { var v0 = l0; var v1 = l1; var v2 = l2; var k = v2; const double M = double.Pi; for (int i = 0; i < OperationsPerInvoke; i += 16) { k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; k += (long)((k - v0) * M); v0 += v1; v1 += v2; } return k; } [SkipLocalsInit] [Benchmark(OperationsPerInvoke = OperationsPerInvoke)] public long NoConversion() { var v0 = l0; var v1 = l1; var v2 = l2; var k = v2; for (int i = 0; i < OperationsPerInvoke; i += 16) { k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; k += k - v0; v0 += v1; v1 += v2; } return k; } [SkipLocalsInit] [Benchmark(OperationsPerInvoke = OperationsPerInvoke)] public long IntegerMultiply() { var v0 = l0; var v1 = l1; var v2 = l2; var k = v2; const long M = 2611923443488327891; for (int i = 0; i < OperationsPerInvoke; i += 16) { k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; k += (k - v0) * M; v0 += v1; v1 += v2; } return k; } [SkipLocalsInit] [Benchmark(OperationsPerInvoke = OperationsPerInvoke)] public long IntegerConstantDivision() { var v0 = l0; var v1 = l1; var v2 = l2; var k = v2; const long M = 445; for (int i = 0; i < OperationsPerInvoke; i += 16) { k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; k += (k - v0) / M; v0 += v1; v1 += v2; } return k; } [SkipLocalsInit] [Benchmark(OperationsPerInvoke = OperationsPerInvoke)] public long IntegerFraction() { var v0 = l0; var v1 = l1; var v2 = l2; var k = v2; const long M = 2611923443488327891; const long Y = 55478262137326323; for (int i = 0; i < OperationsPerInvoke; i += 16) { var diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; diff = k - v0; k += diff * M + Math.BigMul(diff, Y, out _); v0 += v1; v1 += v2; } return k; } } } ```
Benchmark Disassembly ## .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ```assembly ; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerAddLatency() mov rax,[rcx+8] mov rdx,[rcx+10] mov rcx,[rcx+18] xor r8d,r8d nop M00_L00: add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rax,rdx add rdx,rcx add r8d,10 cmp r8d,100000 jl short M00_L00 ret ; Total bytes of code 81 ``` ## .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ```assembly ; BenchmarkPlayground.GetElapsedTimeBenchmarks.Current() mov rax,[rcx+8] mov rdx,[rcx+10] mov rcx,[rcx+18] mov r8,rcx xor r10d,r10d vmovsd xmm0,qword ptr [7FF7A8B0B140] M00_L00: mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax vxorps xmm1,xmm1,xmm1 vcvtsi2sd xmm1,xmm1,r9 vmulsd xmm1,xmm1,xmm0 vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0 vcmpgepd k1,xmm1,[7FF7A8B0B160] vcvttsd2si r9,xmm1 vpbroadcastq xmm1,r9 vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170] vmovq r9,xmm1 add r9,r8 mov r8,r9 add rax,rdx add rdx,rcx add r10d,10 cmp r10d,100000 jl near ptr M00_L00 mov rax,r8 ret ; Total bytes of code 1263 ``` ## .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ```assembly ; BenchmarkPlayground.GetElapsedTimeBenchmarks.NoConversion() mov rax,[rcx+8] mov rdx,[rcx+10] mov rcx,[rcx+18] mov r8,rcx xor r10d,r10d M00_L00: mov r9,r8 sub r9,rax add r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax add r9,r8 mov r8,r9 add rax,rdx add rdx,rcx add r10d,10 cmp r10d,100000 jl near ptr M00_L00 mov rax,r8 ret ; Total bytes of code 279 ``` ## .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ```assembly ; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerMultiply() mov rax,[rcx+8] mov rdx,[rcx+10] mov rcx,[rcx+18] mov r8,rcx xor r10d,r10d M00_L00: mov r9,r8 sub r9,rax mov r11,243F6A8885A308D3 imul r9,r11 add r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx mov r9,r8 sub r9,rax imul r9,r11 add r9,r8 mov r8,r9 add rax,rdx add rdx,rcx add r10d,10 cmp r10d,100000 jl near ptr M00_L00 mov rax,r8 ret ; Total bytes of code 353 ``` ## .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ```assembly ; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerConstantDivision() mov r8,[rcx+8] mov r10,[rcx+10] mov rcx,[rcx+18] mov r9,rcx xor r11d,r11d M00_L00: mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 mov rdx,r9 sub rdx,r8 mov rax,49A2CDF358049A2D imul rdx mov rax,rdx shr rax,3F sar rdx,7 add rax,rdx add rax,r9 mov r9,rax add r8,r10 add r10,rcx add r11d,10 cmp r11d,100000 jl near ptr M00_L00 mov rax,r9 ret ; Total bytes of code 711 ``` ## .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ```assembly ; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerFraction() push rsi push rbx sub rsp,88 mov rax,[rcx+8] mov r8,[rcx+10] mov rcx,[rcx+18] mov r10,rcx xor r9d,r9d M00_L00: mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+80] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add r10,rbx sar rdx,3F mov rbx,0C5192F7B738AF3 and rdx,rbx sub r11,rdx add r10,r11 add rax,r8 mov rdx,r10 sub rdx,rax lea r11,[rsp+78] mulx rbx,rsi,rbx mov [r11],rsi mov r11,243F6A8885A308D3 imul r11,rdx add r11,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub rbx,rdx lea r10,[r11+rbx] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+70] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+68] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+60] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+58] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+50] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+48] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+40] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+38] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+30] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+28] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+20] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+18] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+10] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 mov rdx,r10 sub rdx,rax mov r11,0C5192F7B738AF3 lea rbx,[rsp+8] mulx r11,rsi,r11 mov [rbx],rsi mov rbx,243F6A8885A308D3 imul rbx,rdx add rbx,r10 mov r10,rdx sar r10,3F mov rdx,0C5192F7B738AF3 and rdx,r10 sub r11,rdx lea r10,[rbx+r11] add rax,r8 add r8,rcx add r9d,10 cmp r9d,100000 jl near ptr M00_L00 mov rax,r10 add rsp,88 pop rbx pop rsi ret ; Total bytes of code 1265 ```

Analysis

dotnet-policy-service[bot] commented 3 weeks ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

dotnet-policy-service[bot] commented 3 weeks ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

jkotas commented 3 weeks ago

the performance improvements for trivial cases are worth doing.

Your micro-benchmark numbers show improvement of less than 1 nanoseconds. I do not think that less than 1 nanosecond improvement is worth the added complexity for this API.

AlgorithmsAreCool commented 3 weeks ago

I have used this API in tight measurement loops before. Although the gains are <1ns, they are proportionally significant to baseline.

While I'm sure it can be different on different processors, isn't Stopwatch.Freqeuency == TimeSpan.TicksPerSecond a common case (on x86/x64 anyway).

What about adding a conditional to check if they are equal and special casing that scenario to just use subtraction? It should be eliminated by the JIT due to the static readonly promotion to const. Very little extra complexity for what could be used as a latency sensitive API?

EDIT It looks like the ARM generic timer is typically fixed at 1Ghz, perhaps that could be special cased also as a common case?

jkotas commented 3 weeks ago

It should be eliminated by the JIT due to the static readonly promotion to const.

It would not be eliminated for AOT, so the proposed change would be an improvement for JIT and regression for AOT (in some cases at least).

AlgorithmsAreCool commented 3 weeks ago

I would not be eliminated for AOT...

I should certainly hope not!

MineCake147E commented 3 weeks ago

I accidentally measured reciprocal throughput instead of latency. I updated the result of the benchmarks, which now measure latencies.

KalleOlaviNiemitalo commented 3 weeks ago

If you change Stopwatch.GetElapsedTime, please consider changing TimeProvider.GetElapsedTime as well. https://github.com/dotnet/runtime/blob/81cabf2857a01351e5ab578947c7403a5b128ad1/src/libraries/Common/src/System/TimeProvider.cs#L117