Closed neon-sunset closed 1 year ago
With all previous comments resolved, is there anything else I need to address?
I am getting somewhat weird results with dynamic PGO enabled.
without PGO I see roughly
======== TryGet NonBlocking object->string 1M Ops/sec:
54.74 109.6 153.1 219.7 274.8 319.4 375.3 426.2 481.0 532.8 584.7 634.4 674.1 720.6 762.9 752.2 761.7 777.2 778.8 791.8 798.6 807.6 813.6 809.4 . . .
with PGO I see
======== TryGet NonBlocking object->string 1M Ops/sec:
38.65 35.29 25.69 22.49 21.86 20.64 20.78 20.05 641.2 715.4 777.6 833.0 898.6 948.4 968.4 999.3 1016 1038 1050 1067 1074 1084 1077 1086 1081
I.E. It looks like we get a huge improvement in the throughput from the PGO, but it takes tens of seconds and millions of hashtable Get operations before the benefits kick in. Before that happens the perf is pretty poor.
I am inclined to keep PGO enabled in the benchmark project, since it results in a better perf and thus should be the desired mode for which we optimize. Besides it is just the default, which is easy to change and is often changed during perf investigations anyways, but I wonder if what I see is a normal behavior.
CC: @egorbo - is it normal for PGO effects to be so delayed?
With all previous comments resolved, is there anything else I need to address?
@neon-sunset I do not see anything else that needs changing. Let me run a few tests/benchmarks, to be sure all is ok, and I will merge if nothing too surprising comes up.
CC: @EgorBo - is it normal for PGO effects to be so delayed?
Can you try again with DOTNET_TC_CallCountingDelayMs=0
?
Can you try again with DOTNET_TC_CallCountingDelayMs=0?
does not seem to have any effect
C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net7.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
39.13 31.35 21.25 19.81 19.79 19.37 18.32 18.13 636.9 708.0 766.0 839.0 895.3 ^C
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>set DOTNET_TC_CallCountingDelayMs=0
C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net7.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
39.80 31.26 22.84 20.78 19.52 18.84 18.08 17.51 656.6 703.4 793.1 ^C
C:\Program Files\Microsoft Visual Studio\2022\Preview>
Can you try again with DOTNET_TC_CallCountingDelayMs=0?
does not seem to have any effect
Ouch, but I see it's net7, right? We've landed many improvements for startup in net8 - can you try with at least net8.0 preview 5 (or later) ?
can you try with at least net8.0 preview 5 (or later) ?
Right. 8.0 does not have the issue!
And it seems the overall performance is a bit better too. (these numbers are the throughput for different number of threads and the higher the better)
C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net7.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
40.86 32.82 21.85 20.42 20.21 19.41 19.27 18.99 670.8 738.8 818.4 883.7 920.5 970.4 988.5 1002 1022 1041 1044 1056 1066 ^C
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net8.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
59.96 124.0 179.0 222.0 291.9 351.1 411.2 464.6 737.8 815.8 897.6 975.6 1045 1077 1126 1124 1122 1130 1151 1170 1177 1184 ^C
By the time we reach
_keyComparer.Equals(key.Value, entryKey.Value)
inTryClaimSlotForPut/Copy
whereentryKey
isref Box<TKey>
, the reference might have been changed, leading to catching null reference exception or observing wrong value. Solve this by passing initially dereferencedentryKeyValue
instead, as was probably intended in the first place. Fix this for number primitive implementations too.Additionally:
Fixes https://github.com/neon-sunset/fast-cache/issues/52