dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.66k forks source link

GetIndexOfFirstNonAsciiByte_Vector path in Ascii.Utility.cs is never exercised for AdvSimd/SSE4.1 #89924

Open neon-sunset opened 1 year ago

neon-sunset commented 1 year ago

Description

It appears that GetIndexOfFirstNonAsciiByte in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs#L104 does not exercise the GetIndexOfFirstNonAsciiByte_Vector path despite it having explicit support for AdvSimd and SSE4.1.

Is this intended? Should SSE4.1 and AdvSimd targets be allowed to take _Vector instead?

Regression?

No

Data

Preliminary benchmark shows that on osx-arm64 apple-m1 the time to traverse 1_000_000 bytes is 28.71us with current _Intrinsified path and is improved by ~40% to 20.12us if we relax the check and allow calling GetIndexOfFirstNonAsciiByte_Vector for Vector128.IsHardwareAccelerated.

EgorBo commented 1 year ago

Looks like a regression (or leftover) from https://github.com/dotnet/runtime/pull/88532 cc @anthonycanino

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-text-encoding See info in area-owners.md if you want to be subscribed.

Issue Details
### Description It appears that `GetIndexOfFirstNonAsciiByte` in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs#L104 does not exercise the `GetIndexOfFirstNonAsciiByte_Vector` path despite it having explicit support for `AdvSimd` and `SSE4.1`. Is this intended? Should SSE4.1 and AdvSimd targets be allowed to take `_Vector` instead? ### Regression? No. ### Data Preliminary benchmark shows that on `osx-arm64` `apple-m1` the time to traverse 1_000_000 bytes is 28.71us with current `_Intrinsified` path and is improved by ~40% to `20.12us` if we relax the check and allow calling `GetIndexOfFirstNonAsciiByte_Vector` for `Vector128.IsHardwareAccelerated`.
Author: neon-sunset
Assignees: -
Labels: `area-System.Text.Encoding`, `tenet-performance`, `untriaged`
Milestone: -
stephentoub commented 1 year ago

Regression? Yes, vs .NET 7

@neon-sunset, @tannergooding pointed out to me offline it was using the _Intrinsified path in .NET 7: https://github.com/dotnet/runtime/blob/a6dbb800a47735bde43187350fd3aff4071c7f9c/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L81-L91

Did you measure it to be a regression from .NET 7, and if so, can you share the benchmark and your resulting numbers?

Or are you just saying it's not as good in .NET 8 as it could be but it's not a regression?

EDIT: Tanner just commented on the PR with similar feedback: https://github.com/dotnet/runtime/pull/90527#discussion_r1293751532

neon-sunset commented 1 year ago

Or are you just saying it's not as good in .NET 8 as it could be but it's not a regression?

Yes, it appears to not be a regression because I mistakenly assumed the _Vector path was present in .NET 7. I have updated the description.