List of performance regressions caused by switching to ICU

adamsitnik commented 4 years ago

Before 5.0, we were using ICU only on Unix systems. In 5.0 we have decided to use it on Windows by default as well.

This is something that we have done in order to have the same behavior of string-related globalization APIs on every OS.

However, this particular change has affected the performance characteristics of many frequently used methods. Some of them have regressed, some have improved.

Recently we have reported a lot of 5.0 regressions related to that. Since we have done this on purpose and we are most probably not going to revert the switch, I am opening this issue to track the list of all known regressions. When the list becomes complete, we are most probably going to update the 5.0 release docs and close the issue and label it as wont fix.

Please feel free to edit the list.

Known changes:

System.Memory.ReadOnlySpan.IndexOfString 5 regressions, 5 improvements
System.Globalization.Tests.StringHash 2 regressions, 4 improvements
System.Globalization.Tests.StringEquality: 8 regressions, 14 improvements
System.Globalization.Tests.StringSearch: 30 regressions, 32 improvements
System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse: 1 regression, 5 improvements

cc @danmosemsft @tarekgh @billwert @DrewScoggins @GrabYourPitchforks @jkotas @safern

ghost commented 4 years ago

Tagging subscribers to this area: @tarekgh, @safern, @krwq See info in area-owners.md if you want to be subscribed.

adamsitnik commented 4 years ago

System.Memory.ReadOnlySpan.IndexOfString 5 regressions, 5 improvements

Method	input	value	comparisonType	3.1 Mean	5.0 ICU	5.0 NLS
IndexOfString	AAAAA5AAAA	5	InvariantCulture	79.07 ns	35.41 ns	72.05 ns
IndexOfString	AAAAAAAAAAAA(...)AAAAAAAAAAAA [1000]	X	Ordinal	39.72 ns	38.89 ns	38.25 ns
IndexOfString	AAAAAAAAAAAA(...)AAAAAAAAAAAA [100]	x	InvariantCultureIgnoreCase	300.45 ns	214.39 ns	294.59 ns
IndexOfString	AAAAAAAAAAAA(...)AAAAAAAAAAAA [100]	x	OrdinalIgnoreCase	245.94 ns	802.33 ns	240.42 ns
IndexOfString	ABCDE	c	InvariantCultureIgnoreCase	71.81 ns	30.26 ns	63.83 ns
IndexOfString	Hello Worldb(...)allylong!xyz [186]	w	OrdinalIgnoreCase	45.80 ns	102.63 ns	40.30 ns
IndexOfString	Hello Worldb(...)allylong!xyz [187]	~	Ordinal	28.52 ns	22.28 ns	21.03 ns
IndexOfString	Hello Worldbb(...)bbbbbbbbbbba! [47]	y	Ordinal	24.15 ns	14.09 ns	13.57 ns
IndexOfString	More Test's	Tests	OrdinalIgnoreCase	56.38 ns	108.93 ns	50.48 ns
IndexOfString	StrIng	string	OrdinalIgnoreCase	38.79 ns	55.01 ns	33.86 ns
IndexOfString	foobardzsdzs	rddzs	InvariantCulture	101.46 ns	44.93 ns	94.90 ns
IndexOfString	string1	string2	InvariantCulture	91.45 ns	37.14 ns	80.95 ns
IndexOfString	?	?	InvariantCulture	108.70 ns	4,069.11 ns	105.36 ns
IndexOfString	????????????(...)???????????? [100]	?	Ordinal	25.08 ns	16.88 ns	15.48 ns
IndexOfString	????????????(...)???????????? [1000]	x	Ordinal	39.70 ns	39.19 ns	37.99 ns

Comment: InvariantCulture and InvariantCultureIgnoreCase have improved, while OrdinalIgnoreCase has regressed. We could implement OrdinalIgnoreCase path in managed code if it ever becomes a problem

danmoseley commented 4 years ago

Thank you @adamsitnik good idea I'm glad you created this.

tarekgh commented 4 years ago

I have this PR opened for addressing all ordinal operations https://github.com/dotnet/runtime/pull/40910

tarekgh commented 4 years ago

@adamsitnik are you going to close all other bugs complaining about ICU perf against this one?

adamsitnik commented 4 years ago

are you going to close all other bugs complaining about ICU perf against this one?

That's my plan. I will try to do my best to do it for as many as I can.

tarekgh commented 4 years ago

Also, I want to be clear about the expectation here. I don't think we can fix all perf here as we are limited by calling ICU. we can look at how we can improve it but I am not expecting to get the perf to the point where we used to call NLS. So, it will be good to decide which items in this list is a blocker. The only one was the Ordinal cases which I am addressing in the attached PR. I am not aware of any other blocking scenario. We'll look more of course on other scenarios anyway but I am not sure how much we can do before 5.0 release.

adamsitnik commented 4 years ago

System.Globalization.Tests.StringHash 2 regressions, 4 improvements

Method	Count	Options	3.1 Mean	5.0 ICU Mean	5.0 NLS Mean
GetHashCode	128	(, IgnoreCase)	1,806.07 ns	1,806.0 ns	1,806.10 ns
GetHashCode	128	(, None)	1,823.38 ns	2,284.0 ns	1,818.16 ns
GetHashCode	128	(en-US, IgnoreCase)	1,831.84 ns	1,816.0 ns	1,790.90 ns
GetHashCode	128	(en-US, None)	1,820.91 ns	2,296.8 ns	1,815.97 ns
GetHashCode	128	(en-US, Ordinal)	99.33 ns	100.0 ns	99.61 ns
GetHashCode	128	(en-US, OrdinalIgnoreCase)	123.93 ns	122.6 ns	121.49 ns
GetHashCode	131072	(, IgnoreCase)	2,207,798.50 ns	1,773,953.8 ns	2,193,202.55 ns
GetHashCode	131072	(, None)	2,209,727.34 ns	1,893,550.0 ns	2,187,722.71 ns
GetHashCode	131072	(en-US, IgnoreCase)	2,196,075.39 ns	1,781,730.1 ns	2,190,508.15 ns
GetHashCode	131072	(en-US, None)	2,214,228.87 ns	1,899,934.3 ns	2,185,224.74 ns
GetHashCode	131072	(en-US, Ordinal)	96,294.65 ns	96,513.7 ns	96,551.91 ns
GetHashCode	131072	(en-US, OrdinalIgnoreCase)	113,393.49 ns	112,940.4 ns	116,436.80 ns

Comment: there are two regressions for Count=128 but four improvements for Count=131072

adamsitnik commented 4 years ago

System.Globalization.Tests.StringEquality: 8 regressions, 14 improvements

Method	Count	Options	3.1 Mean	5.0 ICU Mean	5.0 NLS Mean
Compare_Same	1024	(, IgnoreCase)	996.299 ns	575.414 ns	997.39 ns
Compare_Same_Upper	1024	(, IgnoreCase)	3,373.889 ns	6,798.371 ns	3,356.57 ns
Compare_DifferentFirstChar	1024	(, IgnoreCase)	43.730 ns	37.378 ns	42.73 ns
Compare_Same	1024	(, None)	1,003.408 ns	572.528 ns	1,003.25 ns
Compare_Same_Upper	1024	(, None)	4,123.875 ns	6,747.893 ns	4,033.30 ns
Compare_DifferentFirstChar	1024	(, None)	41.883 ns	34.904 ns	43.04 ns
Compare_Same	1024	(en-US, IgnoreCase)	992.625 ns	564.239 ns	1,012.86 ns
Compare_Same_Upper	1024	(en-US, IgnoreCase)	3,334.161 ns	6,789.031 ns	3,336.49 ns
Compare_DifferentFirstChar	1024	(en-US, IgnoreCase)	41.821 ns	36.251 ns	41.92 ns
Compare_Same	1024	(en-US, IgnoreNonSpace)	1,667.106 ns	564.521 ns	1,671.78 ns
Compare_Same_Upper	1024	(en-US, IgnoreNonSpace)	9,547.973 ns	2,103.148 ns	9,599.69 ns
Compare_DifferentFirstChar	1024	(en-US, IgnoreNonSpace)	66.929 ns	35.786 ns	59.21 ns
Compare_Same	1024	(en-US, IgnoreSymbols)	1,676.295 ns	569.468 ns	1,679.39 ns
Compare_Same_Upper	1024	(en-US, IgnoreSymbols)	9,100.891 ns	11,605.391 ns	9,075.81 ns
Compare_DifferentFirstChar	1024	(en-US, IgnoreSymbols)	8,219.648 ns	21,916.551 ns	8,311.14 ns
Compare_Same	1024	(en-US, None)	988.500 ns	572.970 ns	1,005.15 ns
Compare_Same_Upper	1024	(en-US, None)	4,040.998 ns	6,740.818 ns	4,007.38 ns
Compare_DifferentFirstChar	1024	(en-US, None)	41.931 ns	35.349 ns	42.87 ns
Compare_Same	1024	(en-US, Ordinal)	82.818 ns	67.527 ns	68.47 ns
Compare_Same_Upper	1024	(en-US, Ordinal)	13.907 ns	14.796 ns	14.82 ns
Compare_DifferentFirstChar	1024	(en-US, Ordinal)	5.432 ns	9.946 ns	10.27 ns
Compare_Same	1024	(en-US, OrdinalIgnoreCase)	815.592 ns	822.866 ns	834.45 ns
Compare_Same_Upper	1024	(en-US, OrdinalIgnoreCase)	1,225.463 ns	1,218.922 ns	1,253.77 ns
Compare_DifferentFirstChar	1024	(en-US, OrdinalIgnoreCase)	10.517 ns	11.977 ns	11.30 ns
Compare_Same	1024	(pl-PL, None)	21,378.839 ns	572.036 ns	21,617.36 ns
Compare_Same_Upper	1024	(pl-PL, None)	22,136.295 ns	17,518.080 ns	22,375.84 ns
Compare_DifferentFirstChar	1024	(pl-PL, None)	61.794 ns	37.698 ns	62.93 ns

Comment: ICU seems to be faster when inputs are the same or the first character is different but slows down when we are comparing lowercase and uppercase versions of the same words

adamsitnik commented 4 years ago

System.Globalization.Tests.StringSearch: 30 regressions, 32 improvements

Method	Options	3.1 Mean	5.0 ICU Mean	5.0 NLS Mean
IsPrefix_FirstHalf	(, IgnoreCase, False)	279.556 ns	204.122 ns	270.881 ns
IsPrefix_DifferentFirstChar	(, IgnoreCase, False)	58.887 ns	18.055 ns	53.431 ns
IsSuffix_SecondHalf	(, IgnoreCase, False)	264.358 ns	191.353 ns	262.566 ns
IsSuffix_DifferentLastChar	(, IgnoreCase, False)	61.441 ns	18.846 ns	56.044 ns
IndexOf_Word_NotFound	(, IgnoreCase, False)	802.657 ns	628.962 ns	798.926 ns
LastIndexOf_Word_NotFound	(, IgnoreCase, False)	975.995 ns	620.031 ns	965.195 ns
IsPrefix_FirstHalf	(, IgnoreCase, True)	1,012.810 ns	3,677.834 ns	1,019.808 ns
IsPrefix_DifferentFirstChar	(, IgnoreCase, True)	72.985 ns	853.547 ns	72.039 ns
IsSuffix_SecondHalf	(, IgnoreCase, True)	2,552.799 ns	7,467.664 ns	2,575.991 ns
IsSuffix_DifferentLastChar	(, IgnoreCase, True)	5,376.895 ns	1,288.707 ns	5,329.497 ns
IndexOf_Word_NotFound	(, IgnoreCase, True)	3,931.968 ns	11,275.663 ns	3,831.631 ns
LastIndexOf_Word_NotFound	(, IgnoreCase, True)	3,142.703 ns	17,463.707 ns	3,042.517 ns
IsPrefix_FirstHalf	(, None, False)	211.472 ns	173.105 ns	201.325 ns
IsPrefix_DifferentFirstChar	(, None, False)	56.298 ns	17.402 ns	53.396 ns
IsSuffix_SecondHalf	(, None, False)	182.540 ns	174.536 ns	179.486 ns
IsSuffix_DifferentLastChar	(, None, False)	58.275 ns	18.166 ns	55.200 ns
IndexOf_Word_NotFound	(, None, False)	699.751 ns	500.508 ns	678.148 ns
LastIndexOf_Word_NotFound	(, None, False)	850.232 ns	502.581 ns	845.351 ns
IsPrefix_FirstHalf	(, None, True)	1,052.320 ns	3,636.952 ns	1,021.712 ns
IsPrefix_DifferentFirstChar	(, None, True)	73.568 ns	848.922 ns	72.912 ns
IsSuffix_SecondHalf	(, None, True)	2,595.612 ns	7,554.091 ns	2,540.565 ns
IsSuffix_DifferentLastChar	(, None, True)	5,306.733 ns	1,317.628 ns	5,348.958 ns
IndexOf_Word_NotFound	(, None, True)	3,935.667 ns	11,177.835 ns	3,803.958 ns
LastIndexOf_Word_NotFound	(, None, True)	3,083.463 ns	16,993.198 ns	3,029.761 ns
IsPrefix_FirstHalf	(en-US, IgnoreCase, False)	270.134 ns	205.387 ns	273.552 ns
IsPrefix_DifferentFirstChar	(en-US, IgnoreCase, False)	57.763 ns	18.110 ns	54.154 ns
IsSuffix_SecondHalf	(en-US, IgnoreCase, False)	258.893 ns	190.355 ns	260.361 ns
IsSuffix_DifferentLastChar	(en-US, IgnoreCase, False)	58.943 ns	19.272 ns	56.319 ns
IndexOf_Word_NotFound	(en-US, IgnoreCase, False)	807.236 ns	622.869 ns	798.647 ns
LastIndexOf_Word_NotFound	(en-US, IgnoreCase, False)	975.083 ns	623.089 ns	955.210 ns
IsPrefix_FirstHalf	(en-US, IgnoreCase, True)	1,018.976 ns	3,666.344 ns	1,026.411 ns
IsPrefix_DifferentFirstChar	(en-US, IgnoreCase, True)	76.465 ns	887.811 ns	72.117 ns
IsSuffix_SecondHalf	(en-US, IgnoreCase, True)	2,563.798 ns	7,529.526 ns	2,565.718 ns
IsSuffix_DifferentLastChar	(en-US, IgnoreCase, True)	5,305.949 ns	1,304.493 ns	5,346.831 ns
IndexOf_Word_NotFound	(en-US, IgnoreCase, True)	3,892.140 ns	11,776.383 ns	3,926.784 ns
LastIndexOf_Word_NotFound	(en-US, IgnoreCase, True)	3,090.867 ns	17,430.155 ns	3,072.910 ns
IsPrefix_FirstHalf	(en-US, IgnoreNonSpace, False)	1,020.808 ns	178.017 ns	1,017.612 ns
IsPrefix_DifferentFirstChar	(en-US, IgnoreNonSpace, False)	67.336 ns	18.067 ns	65.861 ns
IsSuffix_SecondHalf	(en-US, IgnoreNonSpace, False)	2,552.693 ns	178.425 ns	2,544.768 ns
IsSuffix_DifferentLastChar	(en-US, IgnoreNonSpace, False)	4,921.510 ns	18.199 ns	4,858.353 ns
IndexOf_Word_NotFound	(en-US, IgnoreNonSpace, False)	3,906.457 ns	506.498 ns	3,792.084 ns
LastIndexOf_Word_NotFound	(en-US, IgnoreNonSpace, False)	3,076.625 ns	507.002 ns	3,048.544 ns
IsPrefix_FirstHalf	(en-US, IgnoreSymbols, False)	1,022.075 ns	16,888.924 ns	1,020.346 ns
IsPrefix_DifferentFirstChar	(en-US, IgnoreSymbols, False)	70.421 ns	29,267.090 ns	66.428 ns
IsSuffix_SecondHalf	(en-US, IgnoreSymbols, False)	3,437.616 ns	19,397.341 ns	3,411.783 ns
IsSuffix_DifferentLastChar	(en-US, IgnoreSymbols, False)	5,308.903 ns	34,515.843 ns	5,333.271 ns
IndexOf_Word_NotFound	(en-US, IgnoreSymbols, False)	3,310.485 ns	11,414.777 ns	3,285.185 ns
LastIndexOf_Word_NotFound	(en-US, IgnoreSymbols, False)	3,468.180 ns	16,640.286 ns	3,393.434 ns
IsPrefix_FirstHalf	(en-US, None, False)	205.249 ns	176.915 ns	200.000 ns
IsPrefix_DifferentFirstChar	(en-US, None, False)	59.057 ns	17.779 ns	53.980 ns
IsSuffix_SecondHalf	(en-US, None, False)	184.039 ns	177.072 ns	181.258 ns
IsSuffix_DifferentLastChar	(en-US, None, False)	59.873 ns	18.232 ns	55.897 ns
IndexOf_Word_NotFound	(en-US, None, False)	688.002 ns	504.985 ns	681.283 ns
LastIndexOf_Word_NotFound	(en-US, None, False)	841.277 ns	506.640 ns	859.611 ns
IsPrefix_FirstHalf	(en-US, None, True)	1,026.332 ns	3,676.392 ns	1,129.406 ns
IsPrefix_DifferentFirstChar	(en-US, None, True)	72.958 ns	860.499 ns	73.269 ns
IsSuffix_SecondHalf	(en-US, None, True)	2,555.954 ns	7,625.414 ns	2,556.183 ns
IsSuffix_DifferentLastChar	(en-US, None, True)	5,320.978 ns	1,349.628 ns	5,373.210 ns
IndexOf_Word_NotFound	(en-US, None, True)	3,783.346 ns	11,472.314 ns	3,884.931 ns
LastIndexOf_Word_NotFound	(en-US, None, True)	3,080.818 ns	17,512.610 ns	3,077.301 ns
IsPrefix_FirstHalf	(en-US, Ordinal, False)	12.480 ns	14.469 ns	12.584 ns
IsPrefix_DifferentFirstChar	(en-US, Ordinal, False)	6.229 ns	8.496 ns	8.925 ns
IsSuffix_SecondHalf	(en-US, Ordinal, False)	14.058 ns	12.034 ns	14.248 ns
IsSuffix_DifferentLastChar	(en-US, Ordinal, False)	19.645 ns	17.446 ns	16.225 ns
IndexOf_Word_NotFound	(en-US, Ordinal, False)	39.451 ns	37.728 ns	39.207 ns
LastIndexOf_Word_NotFound	(en-US, Ordinal, False)	137.367 ns	97.667 ns	98.362 ns
IsPrefix_FirstHalf	(en-US, OrdinalIgnoreCase, False)	73.307 ns	82.618 ns	83.575 ns
IsPrefix_DifferentFirstChar	(en-US, OrdinalIgnoreCase, False)	9.844 ns	9.585 ns	9.731 ns
IsSuffix_SecondHalf	(en-US, OrdinalIgnoreCase, False)	109.128 ns	83.181 ns	80.728 ns
IsSuffix_DifferentLastChar	(en-US, OrdinalIgnoreCase, False)	203.785 ns	149.656 ns	141.373 ns
IndexOf_Word_NotFound	(en-US, OrdinalIgnoreCase, False)	701.811 ns	3,125.640 ns	705.609 ns
LastIndexOf_Word_NotFound	(en-US, OrdinalIgnoreCase, False)	617.823 ns	3,161.417 ns	621.380 ns
IsPrefix_FirstHalf	(pl-PL, None, False)	2,594.625 ns	5,704.863 ns	2,558.898 ns
IsPrefix_DifferentFirstChar	(pl-PL, None, False)	101.082 ns	857.463 ns	98.650 ns
IsSuffix_SecondHalf	(pl-PL, None, False)	8,633.560 ns	8,435.046 ns	8,586.963 ns
IsSuffix_DifferentLastChar	(pl-PL, None, False)	19,423.848 ns	1,216.271 ns	17,376.208 ns
IndexOf_Word_NotFound	(pl-PL, None, False)	9,534.684 ns	14,329.673 ns	9,455.980 ns
LastIndexOf_Word_NotFound	(pl-PL, None, False)	8,689.332 ns	18,014.702 ns	8,657.101 ns

adamsitnik commented 4 years ago

System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse: 1 big regression for ja culture, 5 improvements

Method	culturestring	3.1 Mean	5.0 ICU	5.0 NLS
ToStringHebrewIsrael	?	540.1 ns	463.3 ns	518.3 ns
ToString		254.0 ns	248.9 ns	245.7 ns
Parse		466.9 ns	423.7 ns	424.7 ns
ToString	da	247.5 ns	234.0 ns	241.1 ns
Parse	da	551.2 ns	449.3 ns	515.6 ns
ToString	fr	251.5 ns	246.4 ns	244.3 ns
Parse	fr	466.5 ns	424.7 ns	440.2 ns
ToString	ja	258.1 ns	243.5 ns	245.4 ns
Parse	ja	482.1 ns	5,078.9 ns	456.9 ns

Comment: this is a known ICU issue: https://github.com/dotnet/runtime/issues/31273

adamsitnik commented 4 years ago

are you going to close all other bugs complaining about ICU perf against this one?

@tarekgh I've gone through all System.Memory and System.Globalization issues with performance tag and updated the list. It should be complete now.

danmoseley commented 4 years ago

@symbai I see you've thumbs-down. Could you share your concerns?

Symbai commented 4 years ago

@danmosemsft Its not about adamsitnik work of collecting the regressions which I find very helpful. Its about the fact that switching to ICU has introduced MAJOR regressions (+4,500ns, ~ 4x slower) while the "improvements" are usually around ~2ns which I bet is only some noise and running them a couple of times will show there are no improvements at all.

I'm not into ICU and I dont know why this change was made at all but seeing all the performance improvements in .NET Core the last versions and now such a big regression on hot code path that is literally being used anywhere, makes me really wonder what in god's name can be that much useful about ICU that it worth THIS regression. I've thumbs-down because I disagree on this ICU change and I disagree on statements which say that "we might not fix some of these regressions". Especially without telling people that switching to .NET 5 will make their code much more slower without a benefit (there might be a benefit in ICU, but I bet it won't affect most people... unlike this performance regression).

tarekgh commented 4 years ago

@Symbai you have the option to switch back to use NLS if you want to.

ICU is the future direction in general for the .NET and Windows too. ICU will give the opportunity to have a consistency between OS's and OS versions. The benefit of using ICU is really worth it. ICU will give opportunity to the apps to customize the globalization behavior too.

danmoseley commented 4 years ago

Also note that this is already what is used on Linux and Mac, and increasingly used within Windows OS itself. So we are aligning with the industry here. If and where it is slow - we all benefit from making it faster. The .NET team have contributed bug reports and performance improvements to libicu in the past, and I expect we will do so again.

krwq commented 4 years ago

As a side note, ICU is open source and we can contribute to make hot paths faster

iSazonov commented 4 years ago

I am happy to see cross-platform consistency based on ICU but it would be sad to find slowing down on hot paths, specially in PowerShell whose Engine is very sensitive to string operation performance. I hope MSFT team continues to invest in the area in next milestone. As side notice, we could have an extension package with managed implementation for applications which are performance critical and has not memory limitations (can utilize more large and fast tables). Of course, if other ways are not more effective.

tarekgh commented 4 years ago

@iSazonov we already fixed all ordinal[IgnoreCase] operations perf across all functionality). as pointed before, you still have the option to switch back using NLS if needed and you continue work on Windows as used to.

As side notice, we could have an extension package with managed implementation for applications which are performance critical and has not memory limitations (can utilize more large and fast tables). Of course, if other ways are not more effective.

I assume you are talking about PowerShell? of course we'll not do that in .NET as we avoid carrying any globalization data and implementing all functionalities like collations wouldn't be trivial.

by the way, Powershell already running on Linux for awhile with ICU, why you are concerned now?

iSazonov commented 4 years ago

we already fixed all ordinal[IgnoreCase] operations perf across all functionality)

I see. Many thanks!

you still have the option to switch back using NLS

I think it makes no sense because because more and more users work in a heterogeneous environment and with ICU there is less chance of getting different results.

by the way, Powershell already running on Linux for awhile with ICU, why you are concerned now?

Questions like why the performance/behavior is different on Windows and on Unix is very inconvenient. Questions like this sometimes appear in PowerShell repo (not related to the topic). With moving to ICU the likelihood of such questions decreases and this is great. PowerShell is highly dependent on OrdinalIgnoreCase and any improvements here have a positive impact on it. This is my only concern.

Thanks again for your great work!

tarekgh commented 4 years ago

PowerShell is highly dependent on OrdinalIgnoreCase and any improvements here have a positive impact on it. This is my only concern.

Thanks for the info, it is very helpful. could you please try the latest .NET builds which include the Ordinal perf improvements and let's know if you see the differences now? let me know if you need help with that.

DrewScoggins commented 4 years ago

Just wanted to update this thread with the auto-filed results showing big wins (70%) in the System.Memory.ReadOnlySpan benchmarks that regressed.

https://github.com/DrewScoggins/performance-2/issues/1392

DrewScoggins commented 4 years ago

Also seeing wins for System.Globalization.Tests.StringSearch in some of the IndexOf OrdinalIgnoreCase benchmarks. We are seeing that across x64, x86 Windows and x64 Ubuntu.

https://github.com/DrewScoggins/performance-2/issues/1375

iSazonov commented 4 years ago

Thanks for the info, it is very helpful. could you please try the latest .NET builds which include the Ordinal perf improvements and let's know if you see the differences now? let me know if you need help with that.

A day before PowerShell MSFT team tried to move to .Net 5.0 Preview8 but without success. I guess I can do some measurements only after we move to RC1.

/cc @SteveL-MSFT @daxian-dbw for information.

danmoseley commented 4 years ago

Just curious @tarekgh, why do we still see significant differences between Windows (with your improvement) and Ubuntu? eg https://github.com/DrewScoggins/performance-2/issues/1283 --- about 240-260 ns https://github.com/DrewScoggins/performance-2/issues/1334 -- back down from 160ns to 120ns

tarekgh commented 4 years ago

@danmosemsft looking https://github.com/DrewScoggins/performance-2/issues/1283 I am seeing the ordinal ignore casing cases is improved. other none ordinal cases I am not expecting to improve with my change. am I reading the data correctly?

danmoseley commented 4 years ago

@tarekgh your change is good. I was just curious why Linux and Windows perf were still significantly different but I realized I compared the wrong rows.

tarekgh commented 4 years ago

@danmosemsft I was not really sure if I am looking at or interpreting the perf data correctly. that is why I was asking. Thanks for clarifying.

GSPP commented 4 years ago

Would it be in the cards to contribute to ICU so that this operation is not by-design slow? I imagine that many projects relying on ICU would like a fast IndexOf that does require caching a searcher object.

iSazonov commented 4 years ago

@tarekgh I made simple test for PowerShell string comparisons (in Russian locale). I see a huge regression after 5.0-Preview.7 but it is still significant better than 3.1.6.

I hope .Net and PowerShell teams will discuss this and make more reliable tests. /cc @SteveL-MSFT @daxian-dbw

PowerShell version	Duration, sec	.Net version
PowerShell-7.0.3	13.8	3.1.6
PowerShell-7.1.0-Preview.5	5.5	5.0-Preview.6
PowerShell-7.1.0-Preview.6	5.5	5.0-Preview.7
PowerShell-7.1.0-Preview.7	8.7	5.0-Preview.8

PowerShell test script:

$value = (Get-Date).ToString() # Russian locale
$array1 = @()

for ($i=0; $i -lt 5000; $i++) {
    $array1 += $value
}

$value = (Get-Date).ToString()
$array2 = @()

for ($i=0; $i -lt 5000; $i++) {
    $array2 += $value
}

Measure-Command { Compare-Object -ReferenceObject $array1 -DifferenceObject $array2 }

tarekgh commented 4 years ago

@iSazonov are you running on Linux or Windows? On Windows it is expected you see some regression with the Linguistic scenarios because switching to use ICU there. As mentioned before you still have the option to switch back to use NLS if needed. We have addressed all ordinal scenarios though. Also, there were a lot of work going on during these previews but at least we still significantly faster than 3.1 which is a good news.

iSazonov commented 4 years ago

@tarekgh It was Windows only test.

tarekgh commented 4 years ago

I am surprised now this is faster than 3.1 then. are you sure of your results? could you send me a couple of string cases you used in the comparison using the Russian culture?

iSazonov commented 4 years ago

I am surprised now this is faster than 3.1 then. are you sure of your results?

I measured PowerShell not .Net - I mean perhaps there were other optimizations in PowerShell and .Net that could affect the result. We could use PerfView to investigate in depth.

Sample string from my tests:

14 сентября 2020 г. 23:25:22

The second test line only differs in seconds.

GrabYourPitchforks commented 4 years ago

@GSPP I don't think we have extensively studied what contributions we can make here. Regardless, ICU and NLS have different operational philosophies when it comes to this, and ICU's consumers are absolutely used to caching the search object. The canonical scenario for linguistic searching in ICU is for a UI-based application. You open a browser window or word document, enter your search term, then see all matches highlighted in the document and use Next / Previous to iterate through them.

In our case, we case about non-linguistic (ordinal) searching. If this were to be contributed back to ICU it would be in the form of a brand new API. Furthermore, the type of ordinal comparison we're using here (conversion to uppercase) is different than Unicode's own recommendations (conversion to case-fold). All of this is to say that I'm not hopeful of a specialty API like the one .NET is using making its way through.

tarekgh commented 4 years ago

@GrabYourPitchforks @GSPP just to let you know, I am looking at the linguistic IndexOf scenario and experimenting some changes that may help in the perf (around internally caching some ICU objects too). no promise yet as I am still in the middle of looking at that.

Also, we are following up with Windows team as they are trying to do some perf enhancement on ICU too. It is another win situation.

Last, as ICU is open source project, we have contributed some changes before which means it is possible we'll contribute more if it is really required to enhance .NET scenarios. but in general this something we may look at for 6.0 version and beyond.

tarekgh commented 4 years ago

@iSazonov I tried your scenario (using .NET without PS) and I am seeing the similar results as you have reported it. This is very interesting. I am going to look more on the details to understand what is going on.

tarekgh commented 4 years ago

I have collected the ETW benchmark data and now I am seeing the logical results. I am seeing 3.1 is faster (as expected). dotnet benchmark somehow is reporting wrong numbers or there is something causing noises there. here is the ETW data:

3.1

Name                                                            Inc %         Inc   Exc %    Exc
 system.private.corelib.il!CompareInfo.CompareString             95.7      42,349     2.0    893
+ kernel32!CompareStringExStub                                   92.6      40,960     0.5    202
|+ kernelbase!SortCompareString                                  92.1      40,754     2.2    962
|+ ntoskrnl!?                                                     0.0           4     0.0      3
+ coreclr!?                                                       0.6         253     0.6    253
+ system.private.corelib.il!CompareInfo.GetNativeCompareFlags     0.5         219     0.5    219
+ ?!?                                                             0.0          14     0.0     14
+ ntoskrnl!?                                                      0.0          10     0.0      8

5.0

Name                                                        Inc %         Inc   Exc %      Exc
 System.Private.CoreLib.il!CompareInfo.Compare               91.0      66,116     2.6    1,888
+ System.Private.CoreLib.il!CompareInfo.Compare              88.3      64,159     3.3    2,405
|+ System.Private.CoreLib.il!CompareInfo.IcuCompareString    84.8      61,633     5.4    3,941
||+ coreclr!GlobalizationNative_CompareString                74.7      54,287     5.0    3,666
|||+ icu!ucol_strcoll                                        66.5      48,311     4.9    3,530
|||+ ntdll!?                                                  1.6       1,188     1.6    1,182
|||+ coreclr!GetCollatorFromSortHandle                        1.5       1,105     1.5    1,081
|||+ ntoskrnl!?                                               0.0          16     0.0        8
|||+ nvlddmkm!?                                               0.0           1     0.0        1
||+ coreclr!JIT_InitPInvokeFrame                              4.6       3,370     4.6    3,353
||+ ntoskrnl!?                                                0.0          24     0.0        9
||+ coreclr!JIT_PInvokeBegin                                  0.0           7     0.0        7
||+ coreclr!JIT_PInvokeEnd                                    0.0           4     0.0        4

And here is the results (which I believe is wrongly reported from dotnet benchmark) from exact same test.

DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
StringCompare	141.8 ns	3.52 ns	10.39 ns	-	-	-	-

DefaultJob : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
StringCompare	705.3 ns	14.09 ns	35.86 ns	-	-	-	-

and here is the method code I used:

        private static CultureInfo s_culture = CultureInfo.GetCultureInfo("ru-RU");;
        private static string s_string1 = "14 сентября 2020 г. 23:25:22";
        private static string s_string2 = "14 сентября 2020 г. 23:25:24";

        [Benchmark] public int StringCompare() => String.Compare(s_string1, s_string2, s_culture, CompareOptions.None);

@adamsitnik @DrewScoggins do you have any guess why the perf numbers I am getting from dotnet benchmark tool reported that way which not matching what ETW data reported from the exact same test?

tarekgh commented 4 years ago

@DrewScoggins I have merged today the other optimization work which targeting string search operations (IndexOf/LastIndexOf/IsPrefix/IsSuffix/StartsWith/EndsWith). could you please watch the perf results after running my changes and update this issue? Thanks a lot.

DrewScoggins commented 4 years ago

Yes, I will keep an eye out for these to see if we are seeing any improvement.

adamsitnik commented 4 years ago

@tarekgh I've run the benchmark that you have provided and got similar results that confirm that 5.0 using ICU is faster in this particular case.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1082 (1909/November2018Update/19H2)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-rc.1.20452.10
  [Host]     : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  Job-KYODGR : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  Job-PPECTW : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Ratio
StringCompare	Job-KYODGR	.NET Core 3.1	netcoreapp3.1	377.30 ns	4.741 ns	4.435 ns	376.14 ns	1.00
StringCompare	Job-PPECTW	.NET Core 5.0	netcoreapp5.0	68.72 ns	1.342 ns	3.055 ns	67.42 ns	0.19

To compare apples to apples I've set the invocation count (the number of benchmark invocations per iteration) to the same number and filtered the ETW trace file to the last benchmark iteration (description of the mentioned filtering):

--invocationCount 2097152 --profiler ETW

For 3.1 a single iteration (2097152 invocations) takes 791ms:

obraz

For 5.0 a single iteration (2097152 invocations) takes 147ms:

obraz

tarekgh commented 4 years ago

@adamsitnik the scenario you mentioned is the string compare which is different than string search. I was expecting my recent changes affect the string search and not the string compare. maybe I am missing something here.

adamsitnik commented 4 years ago

the scenario you mentioned

@tarekgh I have used the scenario that you have provided (but it was a long time ago - on the 14th of September)

maybe I am missing something here.

most probably in this particular scenario ICU is simply faster

tarekgh commented 4 years ago

I have used the scenario that you have provided (but it was a long time ago - on the 14th of September)

What I mentioned in Sept 14th was using 3.1 on Windows which is using NLS and not ICU. I am expecting 3.1 still be faster.

adamsitnik commented 4 years ago

I am expecting 3.1 still be faster.

So I've verified that 5.0 is faster for this particular case.

DrewScoggins commented 4 years ago

After the most recent check in (https://github.com/dotnet/runtime/pull/43065) that Tarek made, we are seeing some good improvements in the lab.

Run Information

Architecture	x64
OS	Windows 10.0.18362
Changes	diff

Regressions in System.Globalization.Tests.StringSearch

Benchmark	Baseline	Test	Test/Base	Baseline Outlier
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20None%2c%20True)).html>)	13.06 μs	9.15 μs	0.70	True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(%2c%20None%2c%20True)).html>)	18.87 μs	15.01 μs	0.80	True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(pl-PL%2c%20None%2c%20False)).html>)	20.55 μs	16.44 μs	0.80	True
[IsSuffix_SecondHalf](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsSuffix_SecondHalf(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>)	22.05 μs	17.57 μs	0.80	False
[IsPrefix_FirstHalf](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsPrefix_FirstHalf(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>)	19.68 μs	15.59 μs	0.79	False
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>)	12.96 μs	9.05 μs	0.70	True
[IsPrefix_DifferentFirstChar](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>)	34.04 μs	29.81 μs	0.88	True
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(pl-PL%2c%20None%2c%20False)).html>)	16.56 μs	12.95 μs	0.78	True
[IsSuffix_DifferentLastChar](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>)	39.49 μs	34.97 μs	0.89	True
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(%2c%20IgnoreCase%2c%20True)).html>)	13.19 μs	9.30 μs	0.70	True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreCase%2c%20True)).html>)	19.46 μs	15.26 μs	0.78	True
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(%2c%20None%2c%20True)).html>)	12.83 μs	9.16 μs	0.71	False
[IndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreCase%2c%20True)).html>)	13.19 μs	9.27 μs	0.70	True
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20None%2c%20True)).html>)	19.20 μs	14.99 μs	0.78	False
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(%2c%20IgnoreCase%2c%20True)).html>)	19.41 μs	15.28 μs	0.79	False
[LastIndexOf_Word_NotFound](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/master_x64_Windows 10.0.18362/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreSymbols%2c%20False)).html>)	18.57 μs	14.01 μs	0.75	True

graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph graph Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Globalization.Tests.StringSearch*'

### Histogram #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, True)) ```log [ 8961.400 ; 9663.790) | @@@@@@@@@@@@@@ [ 9663.790 ; 10366.179) | [10366.179 ; 11068.568) | [11068.568 ; 11770.957) | [11770.957 ; 12473.347) | [12473.347 ; 12726.514) | [12726.514 ; 13428.903) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13428.903 ; 14213.285) | @@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, True)) ```log [14612.596 ; 15349.017) | @@@@@@@@@@@@@@ [15349.017 ; 16056.794) | [16056.794 ; 16764.571) | [16764.571 ; 17472.348) | [17472.348 ; 18217.629) | [18217.629 ; 18831.044) | @ [18831.044 ; 19538.820) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [19538.820 ; 20122.311) | @@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (pl-PL, None, False)) ```log [16251.652 ; 16961.645) | @@@@@@@@@@@@@@ [16961.645 ; 17671.638) | [17671.638 ; 18381.630) | [18381.630 ; 19091.623) | [19091.623 ; 19801.616) | [19801.616 ; 20174.942) | [20174.942 ; 20884.934) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [20884.934 ; 21604.903) | @@@ ``` #### System.Globalization.Tests.StringSearch.IsSuffix_SecondHalf(Options: (en-US, IgnoreSymbols, False)) ```log [17072.752 ; 17964.482) | @@@@@@@@@@@@@@ [17964.482 ; 18779.784) | [18779.784 ; 19595.085) | [19595.085 ; 20410.387) | [20410.387 ; 21514.014) | [21514.014 ; 22329.316) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [22329.316 ; 23330.116) | @@@@@@@@@@ ``` #### System.Globalization.Tests.StringSearch.IsPrefix_FirstHalf(Options: (en-US, IgnoreSymbols, False)) ```log [15229.897 ; 15975.829) | @@@@@@@@@@@@@@ [15975.829 ; 16721.761) | [16721.761 ; 17467.693) | [17467.693 ; 18213.625) | [18213.625 ; 18959.557) | [18959.557 ; 19302.824) | [19302.824 ; 20048.756) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [20048.756 ; 20855.951) | @@@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreSymbols, False)) ```log [ 8797.707 ; 9499.196) | @@@@@@@@@@@@@@ [ 9499.196 ; 10200.684) | [10200.684 ; 10902.172) | [10902.172 ; 11603.661) | [11603.661 ; 12305.149) | [12305.149 ; 12684.041) | [12684.041 ; 13385.530) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13385.530 ; 14220.333) | @@ ``` #### System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options: (en-US, IgnoreSymbols, False)) ```log [29477.421 ; 30235.312) | @@@@@@@@@@@@@@ [30235.312 ; 30993.203) | [30993.203 ; 31751.095) | [31751.095 ; 32508.986) | [32508.986 ; 33769.724) | [33769.724 ; 35223.347) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (pl-PL, None, False)) ```log [12609.085 ; 13265.894) | @@@@@@@@@@@@@ [13265.894 ; 13768.769) | @ [13768.769 ; 14425.579) | [14425.579 ; 15082.389) | [15082.389 ; 15739.199) | [15739.199 ; 16287.555) | [16287.555 ; 16944.365) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [16944.365 ; 17648.441) | @@ ``` #### System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options: (en-US, IgnoreSymbols, False)) ```log [34563.746 ; 35375.219) | @@@@@@@@@@@@@@ [35375.219 ; 36186.692) | [36186.692 ; 36998.165) | [36998.165 ; 37809.638) | [37809.638 ; 39015.124) | @ [39015.124 ; 39826.597) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [39826.597 ; 40796.807) | @@@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, True)) ```log [ 8890.906 ; 9625.004) | @@@@@@@@@@@@@@ [ 9625.004 ; 10333.195) | [10333.195 ; 11041.386) | [11041.386 ; 11749.577) | [11749.577 ; 12457.769) | [12457.769 ; 13023.837) | @ [13023.837 ; 13732.028) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13732.028 ; 14434.832) | @@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, True)) ```log [14805.514 ; 15554.325) | @@@@@@@@@@@@@@ [15554.325 ; 16303.136) | [16303.136 ; 17051.948) | [17051.948 ; 17800.759) | [17800.759 ; 18662.288) | [18662.288 ; 19285.851) | @@ [19285.851 ; 20034.662) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [20034.662 ; 20599.685) | @@@@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, True)) ```log [ 9024.075 ; 9674.928) | @@@@@@@@@@@@@@ [ 9674.928 ; 10325.780) | [10325.780 ; 10976.632) | [10976.632 ; 11627.484) | [11627.484 ; 12278.336) | [12278.336 ; 12637.952) | [12637.952 ; 13288.804) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13288.804 ; 13899.140) | @@ ``` #### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, True)) ```log [ 8882.220 ; 9650.252) | @@@@@@@@@@@@@@ [ 9650.252 ; 10352.069) | [10352.069 ; 11053.887) | [11053.887 ; 11755.704) | [11755.704 ; 12457.521) | [12457.521 ; 12812.048) | [12812.048 ; 13513.865) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [13513.865 ; 14286.340) | @@@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, True)) ```log [14587.344 ; 15366.223) | @@@@@@@@@@@@@@ [15366.223 ; 16127.006) | [16127.006 ; 16887.789) | [16887.789 ; 17648.572) | [17648.572 ; 18409.355) | [18409.355 ; 18787.408) | [18787.408 ; 19548.191) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [19548.191 ; 20346.411) | @@@@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, True)) ```log [14863.073 ; 15658.737) | @@@@@@@@@@@@@@ [15658.737 ; 16397.914) | [16397.914 ; 17137.092) | [17137.092 ; 17876.269) | [17876.269 ; 18615.446) | [18615.446 ; 18981.118) | [18981.118 ; 19720.295) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [19720.295 ; 20512.543) | @@@@ ``` #### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreSymbols, False)) ```log [13785.427 ; 14537.175) | @@@@@@@@@@@@@@ [14537.175 ; 15288.923) | [15288.923 ; 16040.671) | [16040.671 ; 16792.419) | [16792.419 ; 17544.167) | [17544.167 ; 18045.402) | [18045.402 ; 18964.029) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ [18964.029 ; 19744.608) | @@ ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

tarekgh commented 4 years ago

thanks @DrewScoggins for the updates.

adamsitnik commented 3 years ago

With #43065 that provided some really nice improvements, we are getting even closer to contributing to ICU itself (as we run out of ideas on our side).

While reading the ICU source code I got the impression that it was not written with performance in mind (very few hacks, tricks, and comments explaining them). But I assumed that the code is not doing that because ICU is using PGO that does this for ICU.

But what if ICU is not using PGO? Perhaps this could be our first perf contribution?

I've searched ICU repo and bug tracker for "PGO" and did not find anything: https://github.com/unicode-org/icu/search?q=PGO https://unicode-org.atlassian.net/issues/?jql=text%20~%20%22PGO%22

Just to give you some numbers: when Chromium started using Microsoft's PGO they got +15% improvement for "new tab page load time" and "startup time": https://blog.chromium.org/2016/10/making-chrome-on-windows-faster-with-pgo.html

@tarekgh @GrabYourPitchforks what are your thoughts on this?

danmoseley commented 3 years ago

cc @brianrob

tarekgh commented 3 years ago

For who don't know what is PGO, it is Profile-guided optimization.

dotnet / runtime