dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

List of performance regressions caused by switching to ICU #40942

Open adamsitnik opened 4 years ago

adamsitnik commented 4 years ago

Before 5.0, we were using ICU only on Unix systems. In 5.0 we have decided to use it on Windows by default as well.

This is something that we have done in order to have the same behavior of string-related globalization APIs on every OS.

However, this particular change has affected the performance characteristics of many frequently used methods. Some of them have regressed, some have improved.

Recently we have reported a lot of 5.0 regressions related to that. Since we have done this on purpose and we are most probably not going to revert the switch, I am opening this issue to track the list of all known regressions. When the list becomes complete, we are most probably going to update the 5.0 release docs and close the issue and label it as wont fix.

Please feel free to edit the list.

Known changes:

cc @danmosemsft @tarekgh @billwert @DrewScoggins @GrabYourPitchforks @jkotas @safern

ghost commented 4 years ago

Tagging subscribers to this area: @tarekgh, @safern, @krwq See info in area-owners.md if you want to be subscribed.

adamsitnik commented 4 years ago

System.Memory.ReadOnlySpan.IndexOfString 5 regressions, 5 improvements

Method input value comparisonType 3.1 Mean 5.0 ICU 5.0 NLS
IndexOfString AAAAA5AAAA 5 InvariantCulture 79.07 ns 35.41 ns 72.05 ns
IndexOfString AAAAAAAAAAAA(...)AAAAAAAAAAAA [1000] X Ordinal 39.72 ns 38.89 ns 38.25 ns
IndexOfString AAAAAAAAAAAA(...)AAAAAAAAAAAA [100] x InvariantCultureIgnoreCase 300.45 ns 214.39 ns 294.59 ns
IndexOfString AAAAAAAAAAAA(...)AAAAAAAAAAAA [100] x OrdinalIgnoreCase 245.94 ns 802.33 ns 240.42 ns
IndexOfString ABCDE c InvariantCultureIgnoreCase 71.81 ns 30.26 ns 63.83 ns
IndexOfString Hello Worldb(...)allylong!xyz [186] w OrdinalIgnoreCase 45.80 ns 102.63 ns 40.30 ns
IndexOfString Hello Worldb(...)allylong!xyz [187] ~ Ordinal 28.52 ns 22.28 ns 21.03 ns
IndexOfString Hello Worldbb(...)bbbbbbbbbbba! [47] y Ordinal 24.15 ns 14.09 ns 13.57 ns
IndexOfString More Test's Tests OrdinalIgnoreCase 56.38 ns 108.93 ns 50.48 ns
IndexOfString StrIng string OrdinalIgnoreCase 38.79 ns 55.01 ns 33.86 ns
IndexOfString foobardzsdzs rddzs InvariantCulture 101.46 ns 44.93 ns 94.90 ns
IndexOfString string1 string2 InvariantCulture 91.45 ns 37.14 ns 80.95 ns
IndexOfString ? ? InvariantCulture 108.70 ns 4,069.11 ns 105.36 ns
IndexOfString ????????????(...)???????????? [100] ? Ordinal 25.08 ns 16.88 ns 15.48 ns
IndexOfString ????????????(...)???????????? [1000] x Ordinal 39.70 ns 39.19 ns 37.99 ns

Comment: InvariantCulture and InvariantCultureIgnoreCase have improved, while OrdinalIgnoreCase has regressed. We could implement OrdinalIgnoreCase path in managed code if it ever becomes a problem

danmoseley commented 4 years ago

Thank you @adamsitnik good idea I'm glad you created this.

tarekgh commented 4 years ago

I have this PR opened for addressing all ordinal operations https://github.com/dotnet/runtime/pull/40910

tarekgh commented 4 years ago

@adamsitnik are you going to close all other bugs complaining about ICU perf against this one?

adamsitnik commented 4 years ago

are you going to close all other bugs complaining about ICU perf against this one?

That's my plan. I will try to do my best to do it for as many as I can.

tarekgh commented 4 years ago

Also, I want to be clear about the expectation here. I don't think we can fix all perf here as we are limited by calling ICU. we can look at how we can improve it but I am not expecting to get the perf to the point where we used to call NLS. So, it will be good to decide which items in this list is a blocker. The only one was the Ordinal cases which I am addressing in the attached PR. I am not aware of any other blocking scenario. We'll look more of course on other scenarios anyway but I am not sure how much we can do before 5.0 release.

adamsitnik commented 4 years ago

System.Globalization.Tests.StringHash 2 regressions, 4 improvements

Method Count Options 3.1 Mean 5.0 ICU Mean 5.0 NLS Mean
GetHashCode 128 (, IgnoreCase) 1,806.07 ns 1,806.0 ns 1,806.10 ns
GetHashCode 128 (, None) 1,823.38 ns 2,284.0 ns 1,818.16 ns
GetHashCode 128 (en-US, IgnoreCase) 1,831.84 ns 1,816.0 ns 1,790.90 ns
GetHashCode 128 (en-US, None) 1,820.91 ns 2,296.8 ns 1,815.97 ns
GetHashCode 128 (en-US, Ordinal) 99.33 ns 100.0 ns 99.61 ns
GetHashCode 128 (en-US, OrdinalIgnoreCase) 123.93 ns 122.6 ns 121.49 ns
GetHashCode 131072 (, IgnoreCase) 2,207,798.50 ns 1,773,953.8 ns 2,193,202.55 ns
GetHashCode 131072 (, None) 2,209,727.34 ns 1,893,550.0 ns 2,187,722.71 ns
GetHashCode 131072 (en-US, IgnoreCase) 2,196,075.39 ns 1,781,730.1 ns 2,190,508.15 ns
GetHashCode 131072 (en-US, None) 2,214,228.87 ns 1,899,934.3 ns 2,185,224.74 ns
GetHashCode 131072 (en-US, Ordinal) 96,294.65 ns 96,513.7 ns 96,551.91 ns
GetHashCode 131072 (en-US, OrdinalIgnoreCase) 113,393.49 ns 112,940.4 ns 116,436.80 ns

Comment: there are two regressions for Count=128 but four improvements for Count=131072

adamsitnik commented 4 years ago

System.Globalization.Tests.StringEquality: 8 regressions, 14 improvements

Method Count Options 3.1 Mean 5.0 ICU Mean 5.0 NLS Mean
Compare_Same 1024 (, IgnoreCase) 996.299 ns 575.414 ns 997.39 ns
Compare_Same_Upper 1024 (, IgnoreCase) 3,373.889 ns 6,798.371 ns 3,356.57 ns
Compare_DifferentFirstChar 1024 (, IgnoreCase) 43.730 ns 37.378 ns 42.73 ns
Compare_Same 1024 (, None) 1,003.408 ns 572.528 ns 1,003.25 ns
Compare_Same_Upper 1024 (, None) 4,123.875 ns 6,747.893 ns 4,033.30 ns
Compare_DifferentFirstChar 1024 (, None) 41.883 ns 34.904 ns 43.04 ns
Compare_Same 1024 (en-US, IgnoreCase) 992.625 ns 564.239 ns 1,012.86 ns
Compare_Same_Upper 1024 (en-US, IgnoreCase) 3,334.161 ns 6,789.031 ns 3,336.49 ns
Compare_DifferentFirstChar 1024 (en-US, IgnoreCase) 41.821 ns 36.251 ns 41.92 ns
Compare_Same 1024 (en-US, IgnoreNonSpace) 1,667.106 ns 564.521 ns 1,671.78 ns
Compare_Same_Upper 1024 (en-US, IgnoreNonSpace) 9,547.973 ns 2,103.148 ns 9,599.69 ns
Compare_DifferentFirstChar 1024 (en-US, IgnoreNonSpace) 66.929 ns 35.786 ns 59.21 ns
Compare_Same 1024 (en-US, IgnoreSymbols) 1,676.295 ns 569.468 ns 1,679.39 ns
Compare_Same_Upper 1024 (en-US, IgnoreSymbols) 9,100.891 ns 11,605.391 ns 9,075.81 ns
Compare_DifferentFirstChar 1024 (en-US, IgnoreSymbols) 8,219.648 ns 21,916.551 ns 8,311.14 ns
Compare_Same 1024 (en-US, None) 988.500 ns 572.970 ns 1,005.15 ns
Compare_Same_Upper 1024 (en-US, None) 4,040.998 ns 6,740.818 ns 4,007.38 ns
Compare_DifferentFirstChar 1024 (en-US, None) 41.931 ns 35.349 ns 42.87 ns
Compare_Same 1024 (en-US, Ordinal) 82.818 ns 67.527 ns 68.47 ns
Compare_Same_Upper 1024 (en-US, Ordinal) 13.907 ns 14.796 ns 14.82 ns
Compare_DifferentFirstChar 1024 (en-US, Ordinal) 5.432 ns 9.946 ns 10.27 ns
Compare_Same 1024 (en-US, OrdinalIgnoreCase) 815.592 ns 822.866 ns 834.45 ns
Compare_Same_Upper 1024 (en-US, OrdinalIgnoreCase) 1,225.463 ns 1,218.922 ns 1,253.77 ns
Compare_DifferentFirstChar 1024 (en-US, OrdinalIgnoreCase) 10.517 ns 11.977 ns 11.30 ns
Compare_Same 1024 (pl-PL, None) 21,378.839 ns 572.036 ns 21,617.36 ns
Compare_Same_Upper 1024 (pl-PL, None) 22,136.295 ns 17,518.080 ns 22,375.84 ns
Compare_DifferentFirstChar 1024 (pl-PL, None) 61.794 ns 37.698 ns 62.93 ns

Comment: ICU seems to be faster when inputs are the same or the first character is different but slows down when we are comparing lowercase and uppercase versions of the same words

adamsitnik commented 4 years ago

System.Globalization.Tests.StringSearch: 30 regressions, 32 improvements

Method Options 3.1 Mean 5.0 ICU Mean 5.0 NLS Mean
IsPrefix_FirstHalf (, IgnoreCase, False) 279.556 ns 204.122 ns 270.881 ns
IsPrefix_DifferentFirstChar (, IgnoreCase, False) 58.887 ns 18.055 ns 53.431 ns
IsSuffix_SecondHalf (, IgnoreCase, False) 264.358 ns 191.353 ns 262.566 ns
IsSuffix_DifferentLastChar (, IgnoreCase, False) 61.441 ns 18.846 ns 56.044 ns
IndexOf_Word_NotFound (, IgnoreCase, False) 802.657 ns 628.962 ns 798.926 ns
LastIndexOf_Word_NotFound (, IgnoreCase, False) 975.995 ns 620.031 ns 965.195 ns
IsPrefix_FirstHalf (, IgnoreCase, True) 1,012.810 ns 3,677.834 ns 1,019.808 ns
IsPrefix_DifferentFirstChar (, IgnoreCase, True) 72.985 ns 853.547 ns 72.039 ns
IsSuffix_SecondHalf (, IgnoreCase, True) 2,552.799 ns 7,467.664 ns 2,575.991 ns
IsSuffix_DifferentLastChar (, IgnoreCase, True) 5,376.895 ns 1,288.707 ns 5,329.497 ns
IndexOf_Word_NotFound (, IgnoreCase, True) 3,931.968 ns 11,275.663 ns 3,831.631 ns
LastIndexOf_Word_NotFound (, IgnoreCase, True) 3,142.703 ns 17,463.707 ns 3,042.517 ns
IsPrefix_FirstHalf (, None, False) 211.472 ns 173.105 ns 201.325 ns
IsPrefix_DifferentFirstChar (, None, False) 56.298 ns 17.402 ns 53.396 ns
IsSuffix_SecondHalf (, None, False) 182.540 ns 174.536 ns 179.486 ns
IsSuffix_DifferentLastChar (, None, False) 58.275 ns 18.166 ns 55.200 ns
IndexOf_Word_NotFound (, None, False) 699.751 ns 500.508 ns 678.148 ns
LastIndexOf_Word_NotFound (, None, False) 850.232 ns 502.581 ns 845.351 ns
IsPrefix_FirstHalf (, None, True) 1,052.320 ns 3,636.952 ns 1,021.712 ns
IsPrefix_DifferentFirstChar (, None, True) 73.568 ns 848.922 ns 72.912 ns
IsSuffix_SecondHalf (, None, True) 2,595.612 ns 7,554.091 ns 2,540.565 ns
IsSuffix_DifferentLastChar (, None, True) 5,306.733 ns 1,317.628 ns 5,348.958 ns
IndexOf_Word_NotFound (, None, True) 3,935.667 ns 11,177.835 ns 3,803.958 ns
LastIndexOf_Word_NotFound (, None, True) 3,083.463 ns 16,993.198 ns 3,029.761 ns
IsPrefix_FirstHalf (en-US, IgnoreCase, False) 270.134 ns 205.387 ns 273.552 ns
IsPrefix_DifferentFirstChar (en-US, IgnoreCase, False) 57.763 ns 18.110 ns 54.154 ns
IsSuffix_SecondHalf (en-US, IgnoreCase, False) 258.893 ns 190.355 ns 260.361 ns
IsSuffix_DifferentLastChar (en-US, IgnoreCase, False) 58.943 ns 19.272 ns 56.319 ns
IndexOf_Word_NotFound (en-US, IgnoreCase, False) 807.236 ns 622.869 ns 798.647 ns
LastIndexOf_Word_NotFound (en-US, IgnoreCase, False) 975.083 ns 623.089 ns 955.210 ns
IsPrefix_FirstHalf (en-US, IgnoreCase, True) 1,018.976 ns 3,666.344 ns 1,026.411 ns
IsPrefix_DifferentFirstChar (en-US, IgnoreCase, True) 76.465 ns 887.811 ns 72.117 ns
IsSuffix_SecondHalf (en-US, IgnoreCase, True) 2,563.798 ns 7,529.526 ns 2,565.718 ns
IsSuffix_DifferentLastChar (en-US, IgnoreCase, True) 5,305.949 ns 1,304.493 ns 5,346.831 ns
IndexOf_Word_NotFound (en-US, IgnoreCase, True) 3,892.140 ns 11,776.383 ns 3,926.784 ns
LastIndexOf_Word_NotFound (en-US, IgnoreCase, True) 3,090.867 ns 17,430.155 ns 3,072.910 ns
IsPrefix_FirstHalf (en-US, IgnoreNonSpace, False) 1,020.808 ns 178.017 ns 1,017.612 ns
IsPrefix_DifferentFirstChar (en-US, IgnoreNonSpace, False) 67.336 ns 18.067 ns 65.861 ns
IsSuffix_SecondHalf (en-US, IgnoreNonSpace, False) 2,552.693 ns 178.425 ns 2,544.768 ns
IsSuffix_DifferentLastChar (en-US, IgnoreNonSpace, False) 4,921.510 ns 18.199 ns 4,858.353 ns
IndexOf_Word_NotFound (en-US, IgnoreNonSpace, False) 3,906.457 ns 506.498 ns 3,792.084 ns
LastIndexOf_Word_NotFound (en-US, IgnoreNonSpace, False) 3,076.625 ns 507.002 ns 3,048.544 ns
IsPrefix_FirstHalf (en-US, IgnoreSymbols, False) 1,022.075 ns 16,888.924 ns 1,020.346 ns
IsPrefix_DifferentFirstChar (en-US, IgnoreSymbols, False) 70.421 ns 29,267.090 ns 66.428 ns
IsSuffix_SecondHalf (en-US, IgnoreSymbols, False) 3,437.616 ns 19,397.341 ns 3,411.783 ns
IsSuffix_DifferentLastChar (en-US, IgnoreSymbols, False) 5,308.903 ns 34,515.843 ns 5,333.271 ns
IndexOf_Word_NotFound (en-US, IgnoreSymbols, False) 3,310.485 ns 11,414.777 ns 3,285.185 ns
LastIndexOf_Word_NotFound (en-US, IgnoreSymbols, False) 3,468.180 ns 16,640.286 ns 3,393.434 ns
IsPrefix_FirstHalf (en-US, None, False) 205.249 ns 176.915 ns 200.000 ns
IsPrefix_DifferentFirstChar (en-US, None, False) 59.057 ns 17.779 ns 53.980 ns
IsSuffix_SecondHalf (en-US, None, False) 184.039 ns 177.072 ns 181.258 ns
IsSuffix_DifferentLastChar (en-US, None, False) 59.873 ns 18.232 ns 55.897 ns
IndexOf_Word_NotFound (en-US, None, False) 688.002 ns 504.985 ns 681.283 ns
LastIndexOf_Word_NotFound (en-US, None, False) 841.277 ns 506.640 ns 859.611 ns
IsPrefix_FirstHalf (en-US, None, True) 1,026.332 ns 3,676.392 ns 1,129.406 ns
IsPrefix_DifferentFirstChar (en-US, None, True) 72.958 ns 860.499 ns 73.269 ns
IsSuffix_SecondHalf (en-US, None, True) 2,555.954 ns 7,625.414 ns 2,556.183 ns
IsSuffix_DifferentLastChar (en-US, None, True) 5,320.978 ns 1,349.628 ns 5,373.210 ns
IndexOf_Word_NotFound (en-US, None, True) 3,783.346 ns 11,472.314 ns 3,884.931 ns
LastIndexOf_Word_NotFound (en-US, None, True) 3,080.818 ns 17,512.610 ns 3,077.301 ns
IsPrefix_FirstHalf (en-US, Ordinal, False) 12.480 ns 14.469 ns 12.584 ns
IsPrefix_DifferentFirstChar (en-US, Ordinal, False) 6.229 ns 8.496 ns 8.925 ns
IsSuffix_SecondHalf (en-US, Ordinal, False) 14.058 ns 12.034 ns 14.248 ns
IsSuffix_DifferentLastChar (en-US, Ordinal, False) 19.645 ns 17.446 ns 16.225 ns
IndexOf_Word_NotFound (en-US, Ordinal, False) 39.451 ns 37.728 ns 39.207 ns
LastIndexOf_Word_NotFound (en-US, Ordinal, False) 137.367 ns 97.667 ns 98.362 ns
IsPrefix_FirstHalf (en-US, OrdinalIgnoreCase, False) 73.307 ns 82.618 ns 83.575 ns
IsPrefix_DifferentFirstChar (en-US, OrdinalIgnoreCase, False) 9.844 ns 9.585 ns 9.731 ns
IsSuffix_SecondHalf (en-US, OrdinalIgnoreCase, False) 109.128 ns 83.181 ns 80.728 ns
IsSuffix_DifferentLastChar (en-US, OrdinalIgnoreCase, False) 203.785 ns 149.656 ns 141.373 ns
IndexOf_Word_NotFound (en-US, OrdinalIgnoreCase, False) 701.811 ns 3,125.640 ns 705.609 ns
LastIndexOf_Word_NotFound (en-US, OrdinalIgnoreCase, False) 617.823 ns 3,161.417 ns 621.380 ns
IsPrefix_FirstHalf (pl-PL, None, False) 2,594.625 ns 5,704.863 ns 2,558.898 ns
IsPrefix_DifferentFirstChar (pl-PL, None, False) 101.082 ns 857.463 ns 98.650 ns
IsSuffix_SecondHalf (pl-PL, None, False) 8,633.560 ns 8,435.046 ns 8,586.963 ns
IsSuffix_DifferentLastChar (pl-PL, None, False) 19,423.848 ns 1,216.271 ns 17,376.208 ns
IndexOf_Word_NotFound (pl-PL, None, False) 9,534.684 ns 14,329.673 ns 9,455.980 ns
LastIndexOf_Word_NotFound (pl-PL, None, False) 8,689.332 ns 18,014.702 ns 8,657.101 ns
adamsitnik commented 4 years ago

System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse: 1 big regression for ja culture, 5 improvements

Method culturestring 3.1 Mean 5.0 ICU 5.0 NLS
ToStringHebrewIsrael ? 540.1 ns 463.3 ns 518.3 ns
ToString 254.0 ns 248.9 ns 245.7 ns
Parse 466.9 ns 423.7 ns 424.7 ns
ToString da 247.5 ns 234.0 ns 241.1 ns
Parse da 551.2 ns 449.3 ns 515.6 ns
ToString fr 251.5 ns 246.4 ns 244.3 ns
Parse fr 466.5 ns 424.7 ns 440.2 ns
ToString ja 258.1 ns 243.5 ns 245.4 ns
Parse ja 482.1 ns 5,078.9 ns 456.9 ns

Comment: this is a known ICU issue: https://github.com/dotnet/runtime/issues/31273

adamsitnik commented 4 years ago

are you going to close all other bugs complaining about ICU perf against this one?

@tarekgh I've gone through all System.Memory and System.Globalization issues with performance tag and updated the list. It should be complete now.

danmoseley commented 4 years ago

@symbai I see you've thumbs-down. Could you share your concerns?

Symbai commented 4 years ago

@danmosemsft Its not about adamsitnik work of collecting the regressions which I find very helpful. Its about the fact that switching to ICU has introduced MAJOR regressions (+4,500ns, ~ 4x slower) while the "improvements" are usually around ~2ns which I bet is only some noise and running them a couple of times will show there are no improvements at all.

I'm not into ICU and I dont know why this change was made at all but seeing all the performance improvements in .NET Core the last versions and now such a big regression on hot code path that is literally being used anywhere, makes me really wonder what in god's name can be that much useful about ICU that it worth THIS regression. I've thumbs-down because I disagree on this ICU change and I disagree on statements which say that "we might not fix some of these regressions". Especially without telling people that switching to .NET 5 will make their code much more slower without a benefit (there might be a benefit in ICU, but I bet it won't affect most people... unlike this performance regression).

tarekgh commented 4 years ago

@Symbai you have the option to switch back to use NLS if you want to.

ICU is the future direction in general for the .NET and Windows too. ICU will give the opportunity to have a consistency between OS's and OS versions. The benefit of using ICU is really worth it. ICU will give opportunity to the apps to customize the globalization behavior too.

danmoseley commented 4 years ago

Also note that this is already what is used on Linux and Mac, and increasingly used within Windows OS itself. So we are aligning with the industry here. If and where it is slow - we all benefit from making it faster. The .NET team have contributed bug reports and performance improvements to libicu in the past, and I expect we will do so again.

krwq commented 4 years ago

As a side note, ICU is open source and we can contribute to make hot paths faster

iSazonov commented 4 years ago

I am happy to see cross-platform consistency based on ICU but it would be sad to find slowing down on hot paths, specially in PowerShell whose Engine is very sensitive to string operation performance. I hope MSFT team continues to invest in the area in next milestone. As side notice, we could have an extension package with managed implementation for applications which are performance critical and has not memory limitations (can utilize more large and fast tables). Of course, if other ways are not more effective.

tarekgh commented 4 years ago

@iSazonov we already fixed all ordinal[IgnoreCase] operations perf across all functionality). as pointed before, you still have the option to switch back using NLS if needed and you continue work on Windows as used to.

As side notice, we could have an extension package with managed implementation for applications which are performance critical and has not memory limitations (can utilize more large and fast tables). Of course, if other ways are not more effective.

I assume you are talking about PowerShell? of course we'll not do that in .NET as we avoid carrying any globalization data and implementing all functionalities like collations wouldn't be trivial.

by the way, Powershell already running on Linux for awhile with ICU, why you are concerned now?

iSazonov commented 4 years ago

we already fixed all ordinal[IgnoreCase] operations perf across all functionality)

I see. Many thanks!

you still have the option to switch back using NLS

I think it makes no sense because because more and more users work in a heterogeneous environment and with ICU there is less chance of getting different results.

by the way, Powershell already running on Linux for awhile with ICU, why you are concerned now?

Questions like why the performance/behavior is different on Windows and on Unix is very inconvenient. Questions like this sometimes appear in PowerShell repo (not related to the topic). With moving to ICU the likelihood of such questions decreases and this is great. PowerShell is highly dependent on OrdinalIgnoreCase and any improvements here have a positive impact on it. This is my only concern.

Thanks again for your great work!

tarekgh commented 4 years ago

PowerShell is highly dependent on OrdinalIgnoreCase and any improvements here have a positive impact on it. This is my only concern.

Thanks for the info, it is very helpful. could you please try the latest .NET builds which include the Ordinal perf improvements and let's know if you see the differences now? let me know if you need help with that.

DrewScoggins commented 4 years ago

Just wanted to update this thread with the auto-filed results showing big wins (70%) in the System.Memory.ReadOnlySpan benchmarks that regressed.

https://github.com/DrewScoggins/performance-2/issues/1392

DrewScoggins commented 4 years ago

Also seeing wins for System.Globalization.Tests.StringSearch in some of the IndexOf OrdinalIgnoreCase benchmarks. We are seeing that across x64, x86 Windows and x64 Ubuntu.

https://github.com/DrewScoggins/performance-2/issues/1375

iSazonov commented 4 years ago

Thanks for the info, it is very helpful. could you please try the latest .NET builds which include the Ordinal perf improvements and let's know if you see the differences now? let me know if you need help with that.

A day before PowerShell MSFT team tried to move to .Net 5.0 Preview8 but without success. I guess I can do some measurements only after we move to RC1.

/cc @SteveL-MSFT @daxian-dbw for information.

danmoseley commented 4 years ago

Just curious @tarekgh, why do we still see significant differences between Windows (with your improvement) and Ubuntu? eg https://github.com/DrewScoggins/performance-2/issues/1283 --- about 240-260 ns https://github.com/DrewScoggins/performance-2/issues/1334 -- back down from 160ns to 120ns

tarekgh commented 4 years ago

@danmosemsft looking https://github.com/DrewScoggins/performance-2/issues/1283 I am seeing the ordinal ignore casing cases is improved. other none ordinal cases I am not expecting to improve with my change. am I reading the data correctly?

danmoseley commented 4 years ago

@tarekgh your change is good. I was just curious why Linux and Windows perf were still significantly different but I realized I compared the wrong rows.

tarekgh commented 4 years ago

@danmosemsft I was not really sure if I am looking at or interpreting the perf data correctly. that is why I was asking. Thanks for clarifying.

GSPP commented 4 years ago

Would it be in the cards to contribute to ICU so that this operation is not by-design slow? I imagine that many projects relying on ICU would like a fast IndexOf that does require caching a searcher object.

iSazonov commented 4 years ago

@tarekgh I made simple test for PowerShell string comparisons (in Russian locale). I see a huge regression after 5.0-Preview.7 but it is still significant better than 3.1.6.

I hope .Net and PowerShell teams will discuss this and make more reliable tests. /cc @SteveL-MSFT @daxian-dbw

PowerShell version Duration, sec .Net version
PowerShell-7.0.3 13.8 3.1.6
PowerShell-7.1.0-Preview.5 5.5 5.0-Preview.6
PowerShell-7.1.0-Preview.6 5.5 5.0-Preview.7
PowerShell-7.1.0-Preview.7 8.7 5.0-Preview.8

PowerShell test script:

$value = (Get-Date).ToString() # Russian locale
$array1 = @()

for ($i=0; $i -lt 5000; $i++) {
    $array1 += $value
}

$value = (Get-Date).ToString()
$array2 = @()

for ($i=0; $i -lt 5000; $i++) {
    $array2 += $value
}

Measure-Command { Compare-Object -ReferenceObject $array1 -DifferenceObject $array2 }
tarekgh commented 4 years ago

@iSazonov are you running on Linux or Windows? On Windows it is expected you see some regression with the Linguistic scenarios because switching to use ICU there. As mentioned before you still have the option to switch back to use NLS if needed. We have addressed all ordinal scenarios though. Also, there were a lot of work going on during these previews but at least we still significantly faster than 3.1 which is a good news.

iSazonov commented 4 years ago

@tarekgh It was Windows only test.

tarekgh commented 4 years ago

I am surprised now this is faster than 3.1 then. are you sure of your results? could you send me a couple of string cases you used in the comparison using the Russian culture?

iSazonov commented 4 years ago

I am surprised now this is faster than 3.1 then. are you sure of your results?

I measured PowerShell not .Net - I mean perhaps there were other optimizations in PowerShell and .Net that could affect the result. We could use PerfView to investigate in depth.

Sample string from my tests:

14 сентября 2020 г. 23:25:22

The second test line only differs in seconds.

GrabYourPitchforks commented 4 years ago

@GSPP I don't think we have extensively studied what contributions we can make here. Regardless, ICU and NLS have different operational philosophies when it comes to this, and ICU's consumers are absolutely used to caching the search object. The canonical scenario for linguistic searching in ICU is for a UI-based application. You open a browser window or word document, enter your search term, then see all matches highlighted in the document and use Next / Previous to iterate through them.

In our case, we case about non-linguistic (ordinal) searching. If this were to be contributed back to ICU it would be in the form of a brand new API. Furthermore, the type of ordinal comparison we're using here (conversion to uppercase) is different than Unicode's own recommendations (conversion to case-fold). All of this is to say that I'm not hopeful of a specialty API like the one .NET is using making its way through.

tarekgh commented 4 years ago

@GrabYourPitchforks @GSPP just to let you know, I am looking at the linguistic IndexOf scenario and experimenting some changes that may help in the perf (around internally caching some ICU objects too). no promise yet as I am still in the middle of looking at that.

Also, we are following up with Windows team as they are trying to do some perf enhancement on ICU too. It is another win situation.

Last, as ICU is open source project, we have contributed some changes before which means it is possible we'll contribute more if it is really required to enhance .NET scenarios. but in general this something we may look at for 6.0 version and beyond.

tarekgh commented 4 years ago

@iSazonov I tried your scenario (using .NET without PS) and I am seeing the similar results as you have reported it. This is very interesting. I am going to look more on the details to understand what is going on.

tarekgh commented 4 years ago

I have collected the ETW benchmark data and now I am seeing the logical results. I am seeing 3.1 is faster (as expected). dotnet benchmark somehow is reporting wrong numbers or there is something causing noises there. here is the ETW data:

3.1

Name                                                            Inc %         Inc   Exc %    Exc
 system.private.corelib.il!CompareInfo.CompareString             95.7      42,349     2.0    893
+ kernel32!CompareStringExStub                                   92.6      40,960     0.5    202
|+ kernelbase!SortCompareString                                  92.1      40,754     2.2    962
|+ ntoskrnl!?                                                     0.0           4     0.0      3
+ coreclr!?                                                       0.6         253     0.6    253
+ system.private.corelib.il!CompareInfo.GetNativeCompareFlags     0.5         219     0.5    219
+ ?!?                                                             0.0          14     0.0     14
+ ntoskrnl!?                                                      0.0          10     0.0      8

5.0

Name                                                        Inc %         Inc   Exc %      Exc
 System.Private.CoreLib.il!CompareInfo.Compare               91.0      66,116     2.6    1,888
+ System.Private.CoreLib.il!CompareInfo.Compare              88.3      64,159     3.3    2,405
|+ System.Private.CoreLib.il!CompareInfo.IcuCompareString    84.8      61,633     5.4    3,941
||+ coreclr!GlobalizationNative_CompareString                74.7      54,287     5.0    3,666
|||+ icu!ucol_strcoll                                        66.5      48,311     4.9    3,530
|||+ ntdll!?                                                  1.6       1,188     1.6    1,182
|||+ coreclr!GetCollatorFromSortHandle                        1.5       1,105     1.5    1,081
|||+ ntoskrnl!?                                               0.0          16     0.0        8
|||+ nvlddmkm!?                                               0.0           1     0.0        1
||+ coreclr!JIT_InitPInvokeFrame                              4.6       3,370     4.6    3,353
||+ ntoskrnl!?                                                0.0          24     0.0        9
||+ coreclr!JIT_PInvokeBegin                                  0.0           7     0.0        7
||+ coreclr!JIT_PInvokeEnd                                    0.0           4     0.0        4

And here is the results (which I believe is wrongly reported from dotnet benchmark) from exact same test.

DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
StringCompare 141.8 ns 3.52 ns 10.39 ns - - - -

DefaultJob : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
StringCompare 705.3 ns 14.09 ns 35.86 ns - - - -

and here is the method code I used:

        private static CultureInfo s_culture = CultureInfo.GetCultureInfo("ru-RU");;
        private static string s_string1 = "14 сентября 2020 г. 23:25:22";
        private static string s_string2 = "14 сентября 2020 г. 23:25:24";

        [Benchmark] public int StringCompare() => String.Compare(s_string1, s_string2, s_culture, CompareOptions.None);

@adamsitnik @DrewScoggins do you have any guess why the perf numbers I am getting from dotnet benchmark tool reported that way which not matching what ETW data reported from the exact same test?

tarekgh commented 4 years ago

@DrewScoggins I have merged today the other optimization work which targeting string search operations (IndexOf/LastIndexOf/IsPrefix/IsSuffix/StartsWith/EndsWith). could you please watch the perf results after running my changes and update this issue? Thanks a lot.

DrewScoggins commented 4 years ago

Yes, I will keep an eye out for these to see if we are seeing any improvement.

adamsitnik commented 4 years ago

@tarekgh I've run the benchmark that you have provided and got similar results that confirm that 5.0 using ICU is faster in this particular case.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1082 (1909/November2018Update/19H2)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-rc.1.20452.10
  [Host]     : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  Job-KYODGR : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  Job-PPECTW : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT
Method Job Runtime Toolchain Mean Error StdDev Median Ratio
StringCompare Job-KYODGR .NET Core 3.1 netcoreapp3.1 377.30 ns 4.741 ns 4.435 ns 376.14 ns 1.00
StringCompare Job-PPECTW .NET Core 5.0 netcoreapp5.0 68.72 ns 1.342 ns 3.055 ns 67.42 ns 0.19

To compare apples to apples I've set the invocation count (the number of benchmark invocations per iteration) to the same number and filtered the ETW trace file to the last benchmark iteration (description of the mentioned filtering):

--invocationCount 2097152 --profiler ETW

For 3.1 a single iteration (2097152 invocations) takes 791ms:

obraz

For 5.0 a single iteration (2097152 invocations) takes 147ms:

obraz

tarekgh commented 4 years ago

@adamsitnik the scenario you mentioned is the string compare which is different than string search. I was expecting my recent changes affect the string search and not the string compare. maybe I am missing something here.

adamsitnik commented 4 years ago

the scenario you mentioned

@tarekgh I have used the scenario that you have provided (but it was a long time ago - on the 14th of September)

maybe I am missing something here.

most probably in this particular scenario ICU is simply faster

tarekgh commented 4 years ago

I have used the scenario that you have provided (but it was a long time ago - on the 14th of September)

What I mentioned in Sept 14th was using 3.1 on Windows which is using NLS and not ICU. I am expecting 3.1 still be faster.

adamsitnik commented 4 years ago

I am expecting 3.1 still be faster.

So I've verified that 5.0 is faster for this particular case.

DrewScoggins commented 4 years ago

After the most recent check in (https://github.com/dotnet/runtime/pull/43065) that Tarek made, we are seeing some good improvements in the lab.

Run Information

Architecture x64
OS Windows 10.0.18362
Changes diff

Regressions in System.Globalization.Tests.StringSearch

Benchmark Baseline Test Test/Base Modality Baseline Outlier Baseline ETL Comapre ETL
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20None%2c%20True)).html>) 13.06 μs 9.15 μs 0.70 True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(%2c%20None%2c%20True)).html>) 18.87 μs 15.01 μs 0.80 True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(pl-PL%2c%20None%2c%20False)).html>) 20.55 μs 16.44 μs 0.80 True
[IsSuffix_SecondHalf](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsSuffix_SecondHalf(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>) 22.05 μs 17.57 μs 0.80 False
[IsPrefix_FirstHalf](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsPrefix_FirstHalf(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>) 19.68 μs 15.59 μs 0.79 False
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>) 12.96 μs 9.05 μs 0.70 True
[IsPrefix_DifferentFirstChar](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>) 34.04 μs 29.81 μs 0.88 True
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(pl-PL%2c%20None%2c%20False)).html>) 16.56 μs 12.95 μs 0.78 True
[IsSuffix_DifferentLastChar](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>) 39.49 μs 34.97 μs 0.89 True
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(%2c%20IgnoreCase%2c%20True)).html>) 13.19 μs 9.30 μs 0.70 True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreCase%2c%20True)).html>) 19.46 μs 15.26 μs 0.78 True
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(%2c%20None%2c%20True)).html>) 12.83 μs 9.16 μs 0.71 False
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreCase%2c%20True)).html>) 13.19 μs 9.27 μs 0.70 True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20None%2c%20True)).html>) 19.20 μs 14.99 μs 0.78 False
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(%2c%20IgnoreCase%2c%20True)).html>) 19.41 μs 15.28 μs 0.79 False
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>) 18.57 μs 14.01 μs 0.75 True

graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Globalization.Tests.StringSearch*'
### Histogram #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, True)) ```log [ 8961.400 ; 9663.790) | @@@@@@@@@@@@@@ [ 9663.790 ; 10366.179) | [10366.179 ; 11068.568) | [11068.568 ; 11770.957) | [11770.957 ; 12473.347) | [12473.347 ; 12726.514) | [12726.514 ; 13428.903) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13428.903 ; 14213.285) | @@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, True)) ```log [14612.596 ; 15349.017) | @@@@@@@@@@@@@@ [15349.017 ; 16056.794) | [16056.794 ; 16764.571) | [16764.571 ; 17472.348) | [17472.348 ; 18217.629) | [18217.629 ; 18831.044) | @ [18831.044 ; 19538.820) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [19538.820 ; 20122.311) | @@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (pl-PL, None, False)) ```log [16251.652 ; 16961.645) | @@@@@@@@@@@@@@ [16961.645 ; 17671.638) | [17671.638 ; 18381.630) | [18381.630 ; 19091.623) | [19091.623 ; 19801.616) | [19801.616 ; 20174.942) | [20174.942 ; 20884.934) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [20884.934 ; 21604.903) | @@@ ``` #### System.Globalization.Tests.StringSearch.IsSuffix_SecondHalf(Options: (en-US, IgnoreSymbols, False)) ```log [17072.752 ; 17964.482) | @@@@@@@@@@@@@@ [17964.482 ; 18779.784) | [18779.784 ; 19595.085) | [19595.085 ; 20410.387) | [20410.387 ; 21514.014) | [21514.014 ; 22329.316) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [22329.316 ; 23330.116) | @@@@@@@@@@ ``` #### System.Globalization.Tests.StringSearch.IsPrefix_FirstHalf(Options: (en-US, IgnoreSymbols, False)) ```log [15229.897 ; 15975.829) | @@@@@@@@@@@@@@ [15975.829 ; 16721.761) | [16721.761 ; 17467.693) | [17467.693 ; 18213.625) | [18213.625 ; 18959.557) | [18959.557 ; 19302.824) | [19302.824 ; 20048.756) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [20048.756 ; 20855.951) | @@@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreSymbols, False)) ```log [ 8797.707 ; 9499.196) | @@@@@@@@@@@@@@ [ 9499.196 ; 10200.684) | [10200.684 ; 10902.172) | [10902.172 ; 11603.661) | [11603.661 ; 12305.149) | [12305.149 ; 12684.041) | [12684.041 ; 13385.530) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13385.530 ; 14220.333) | @@ ``` #### System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options: (en-US, IgnoreSymbols, False)) ```log [29477.421 ; 30235.312) | @@@@@@@@@@@@@@ [30235.312 ; 30993.203) | [30993.203 ; 31751.095) | [31751.095 ; 32508.986) | [32508.986 ; 33769.724) | [33769.724 ; 35223.347) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (pl-PL, None, False)) ```log [12609.085 ; 13265.894) | @@@@@@@@@@@@@ [13265.894 ; 13768.769) | @ [13768.769 ; 14425.579) | [14425.579 ; 15082.389) | [15082.389 ; 15739.199) | [15739.199 ; 16287.555) | [16287.555 ; 16944.365) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [16944.365 ; 17648.441) | @@ ``` #### System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options: (en-US, IgnoreSymbols, False)) ```log [34563.746 ; 35375.219) | @@@@@@@@@@@@@@ [35375.219 ; 36186.692) | [36186.692 ; 36998.165) | [36998.165 ; 37809.638) | [37809.638 ; 39015.124) | @ [39015.124 ; 39826.597) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [39826.597 ; 40796.807) | @@@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, True)) ```log [ 8890.906 ; 9625.004) | @@@@@@@@@@@@@@ [ 9625.004 ; 10333.195) | [10333.195 ; 11041.386) | [11041.386 ; 11749.577) | [11749.577 ; 12457.769) | [12457.769 ; 13023.837) | @ [13023.837 ; 13732.028) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13732.028 ; 14434.832) | @@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, True)) ```log [14805.514 ; 15554.325) | @@@@@@@@@@@@@@ [15554.325 ; 16303.136) | [16303.136 ; 17051.948) | [17051.948 ; 17800.759) | [17800.759 ; 18662.288) | [18662.288 ; 19285.851) | @@ [19285.851 ; 20034.662) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [20034.662 ; 20599.685) | @@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, True)) ```log [ 9024.075 ; 9674.928) | @@@@@@@@@@@@@@ [ 9674.928 ; 10325.780) | [10325.780 ; 10976.632) | [10976.632 ; 11627.484) | [11627.484 ; 12278.336) | [12278.336 ; 12637.952) | [12637.952 ; 13288.804) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13288.804 ; 13899.140) | @@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, True)) ```log [ 8882.220 ; 9650.252) | @@@@@@@@@@@@@@ [ 9650.252 ; 10352.069) | [10352.069 ; 11053.887) | [11053.887 ; 11755.704) | [11755.704 ; 12457.521) | [12457.521 ; 12812.048) | [12812.048 ; 13513.865) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13513.865 ; 14286.340) | @@@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, True)) ```log [14587.344 ; 15366.223) | @@@@@@@@@@@@@@ [15366.223 ; 16127.006) | [16127.006 ; 16887.789) | [16887.789 ; 17648.572) | [17648.572 ; 18409.355) | [18409.355 ; 18787.408) | [18787.408 ; 19548.191) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [19548.191 ; 20346.411) | @@@@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, True)) ```log [14863.073 ; 15658.737) | @@@@@@@@@@@@@@ [15658.737 ; 16397.914) | [16397.914 ; 17137.092) | [17137.092 ; 17876.269) | [17876.269 ; 18615.446) | [18615.446 ; 18981.118) | [18981.118 ; 19720.295) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [19720.295 ; 20512.543) | @@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreSymbols, False)) ```log [13785.427 ; 14537.175) | @@@@@@@@@@@@@@ [14537.175 ; 15288.923) | [15288.923 ; 16040.671) | [16040.671 ; 16792.419) | [16792.419 ; 17544.167) | [17544.167 ; 18045.402) | [18045.402 ; 18964.029) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [18964.029 ; 19744.608) | @@ ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
tarekgh commented 4 years ago

thanks @DrewScoggins for the updates.

adamsitnik commented 3 years ago

With #43065 that provided some really nice improvements, we are getting even closer to contributing to ICU itself (as we run out of ideas on our side).

While reading the ICU source code I got the impression that it was not written with performance in mind (very few hacks, tricks, and comments explaining them). But I assumed that the code is not doing that because ICU is using PGO that does this for ICU.

But what if ICU is not using PGO? Perhaps this could be our first perf contribution?

I've searched ICU repo and bug tracker for "PGO" and did not find anything: https://github.com/unicode-org/icu/search?q=PGO https://unicode-org.atlassian.net/issues/?jql=text%20~%20%22PGO%22

Just to give you some numbers: when Chromium started using Microsoft's PGO they got +15% improvement for "new tab page load time" and "startup time": https://blog.chromium.org/2016/10/making-chrome-on-windows-faster-with-pgo.html

@tarekgh @GrabYourPitchforks what are your thoughts on this?

danmoseley commented 3 years ago

cc @brianrob

tarekgh commented 3 years ago

For who don't know what is PGO, it is Profile-guided optimization.