Open performanceautofiler[bot] opened 4 years ago
Before loop alignment changes:
Method | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|
ImmutableDictionary | 512 | 17.33 us | 0.063 us | 0.056 us | 17.34 us | 17.21 us | 17.43 us | - | - | - | - |
ImmutableDictionary | 512 | 17.61 us | 0.151 us | 0.141 us | 17.58 us | 17.35 us | 17.84 us | - | - | - | - |
ImmutableDictionary | 512 | 17.58 us | 0.121 us | 0.107 us | 17.57 us | 17.42 us | 17.81 us | - | - | - | - |
After loop alignment changes:
Method | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|
ImmutableDictionary | 512 | 18.56 us | 0.159 us | 0.141 us | 18.52 us | 18.38 us | 18.84 us | - | - | - | - |
ImmutableDictionary | 512 | 18.49 us | 0.240 us | 0.224 us | 18.50 us | 18.19 us | 19.03 us | - | - | - | - |
ImmutableDictionary | 512 | 18.44 us | 0.117 us | 0.104 us | 18.45 us | 18.30 us | 18.66 us | - | - | - | - |
The regression might be coming from extra padding we added in TryGetValue()
cc: @adamsitnik , @AndyAyersMS
Do we have an explanation for the regression we see here?
As far as padding: for dictionaries, we actually don't expect lookups to iterate much as that means there are hash collisions. So it's certainly possible the cost of the padding (especially such a large amount like we see here) matters.
This might be a good case for the "minimal number of bundles" experiment, presumably without padding the loop would still fit in two bundles.
I already had "minimum number of bundles" on, except that I was missing a check (needed <=
instead of <
) that decides if padding helps the loop or not. With that, we see slightly better performance.
Method | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|
ImmutableDictionary | 512 | 18.06 us | 0.203 us | 0.190 us | 18.07 us | 17.78 us | 18.46 us | - | - | - | - |
ImmutableDictionary | 512 | 17.92 us | 0.038 us | 0.029 us | 17.92 us | 17.86 us | 17.95 us | - | - | - | - |
ImmutableDictionary | 512 | 18.02 us | 0.221 us | 0.207 us | 17.94 us | 17.77 us | 18.43 us | - | - | - | - |
ImmutableDictionary | 512 | 18.09 us | 0.198 us | 0.176 us | 18.04 us | 17.89 us | 18.45 us | - | - | - | - |
Through out the benchmark run, I tried to log places we add some larger alignment inserted and that might be causing some regression. We can talk offline, but just dumping out the places that added alignment.
I wonder if you're seeing the impact of the "branch splitting or ending at 32 byte boundary" issue in some of these (eg dotnet/runtime#13795). For instance this jump now ends at 0x...A0
and so presumably get penalized.
00007ffb`9844e39e EBE8 jmp SHORT G_M1624_IG03
Is that something you can track?
Thanks for pointing me to it. I tried adding a check for JCC but that leads logic to add extra padding that I showed earlier. So I think we need to evaluate if not splitting branch is worth than adding extra padding. Do you know any way of measure that apart from experimenting?
Run Information
Regressions in System.Collections.ContainsKeyFalse<Int32, Int32>
Historical Data in Reporting System
Repro