Closed stephentoub closed 2 years ago
Tagging subscribers to this area: @GrabYourPitchForks, @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-System.Memory`, `tenet-performance`, `untriaged` |
Milestone: | 7.0.0 |
a loop around an IndexOf for the first character in the pattern and then a SequenceEqual at that position for that pattern
It's similar for byte
, so while optimizing the char
-case we should try to look at byte
too.
Algorithm uri doesn't work for me @stephentoub
Algorithm uri doesn't work for me @stephentoub
Not sure why; it works ok for me.
@En3Tho
I'm guessing it's because the site is not available on HTTPS. Some browsers are doing trials of automatic HTTPS redirects. You can try a Web Archive copy of the page: https://web.archive.org/web/20210809014600/http://0x80.pl/articles/simd-strfind.html
Prototype: https://github.com/EgorBo/runtime-1/commit/2770de1141aa8b2f1329dc8d3ce10a18d0591ce3
Tested. Needs benchmarks and ARM64 support.
Performance is noticeably better in the worst case for the current algorithm, e.g. in this text it hits multiple words starting with 'a'
till it finds the correct one while Wojciech's one easily avoids calling SequenceEqual
for each of them and doesn't break SIMD-loop.
Today
MemoryExtensions.IndexOf(..., ReadOnlySpan<char>)
is implemented essentially as a loop around an IndexOf for the first character in the pattern and then a SequenceEqual at that position for that pattern: https://github.com/dotnet/runtime/blob/c742923991c68808d55a9f7cfcd7796644fea44f/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.cs#L17-L57 This is good but not great. In particular it ends up breaking out of the vectorized search loop potentially much more frequently than is necessary and paying the startup overhead for the SequenceEqual each time.There are better options. For example, the simple algorithm described in Algorithm 1: Generic SIMD searches concurrently for two characters rather than just one, which makes it much less likely to break out of the vectorized loop to early (this is what the Rust sliceslice crate uses; it's also what the Rust memchr crate uses (which is what's now used by its regex engine), though that memchr doesn't just use the first and last characters but instead picks two characters expected to have the lowest frequency of occurrence).
We could also special-case various sizes. The aforementioned memchr, for example, uses Rabin-Karp for small input strings and Two-Way for larger search strings.
cc: @tannergooding, @GrabYourPitchforks