dotnet / docfx

Static site generator for .NET API documentation.
https://dotnet.github.io/docfx/
MIT License
3.94k stars 839 forks source link

Optimize CountWordInText #10050

Closed lahma closed 4 days ago

lahma commented 5 days ago

image

filzrev commented 4 days ago

It seems that it can reduce extra memory allocation that caused by string.Split and string.Trim()
by using Span-based API.

Currently it seems there is no SplitAny API that can be used for this purpose. (See: https://github.com/dotnet/runtime/issues/934)
Instead. it can be implemented by simple loop logics.

Example of Benchmark results

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
Default 163.41 ms 1.960 ms 1.833 ms 1.00 0.00 101250.0000 32250.0000 430256464 B 1.000
Optimized1(string) 132.51 ms 2.489 ms 2.329 ms 0.81 0.02 61500.0000 500.0000 257720518 B 0.599
Optimized2(SearchValue) 133.73 ms 2.538 ms 2.607 ms 0.82 0.02 61500.0000 500.0000 257720518 B 0.599
Optimized3(Span-based) 80.74 ms 1.560 ms 1.602 ms 0.49 0.01 - - 313 B 0.000

Benchmark code (LINQPad)

CountwordBenchmark.linq ``` BenchmarkDotNet.Attributes System.Globalization System.Runtime.CompilerServices System.Threading.Tasks #load "BenchmarkDotNet" using System.Diagnostics; using System.Globalization; using System.Runtime.CompilerServices; void Main() { RunBenchmark(); } private const int IterationCount = 1000; private string text = ""; [GlobalSetup] public void BenchmarkSetup() { text = File.ReadAllText(@"C:\temp\docfx\docs\_site\docs\markdown.html"); } [Benchmark(Baseline = true)] public void Default() { foreach (var i in Enumerable.Range(1, IterationCount)) WordCounter_Default.CountWordInText(text); } [Benchmark] public void Optimized1() { foreach (var i in Enumerable.Range(1, IterationCount)) WordCounter_Optimized2.CountWordInText(text); } [Benchmark] public void Optimized2() { foreach (var i in Enumerable.Range(1, IterationCount)) WordCounter_Optimized2.CountWordInText(text); } [Benchmark] public void Optimized3() { foreach (var i in Enumerable.Range(1, IterationCount)) WordCounter_Optimized3.CountWordInText(text); } public static class WordCounter_Default { public static int CountWordInText(string text) { if (string.IsNullOrWhiteSpace(text)) { return 0; } string specialChars = ".?!;:,()[]"; char[] delimiterChars = [' ', '\t', '\n']; string[] wordList = text.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries); return wordList.Count(s => !s.Trim().All(specialChars.Contains)); } } public static class WordCounter_Optimized1 { private static readonly string SpecialChars = ".?!;:,()[]"; private static readonly char[] DelimiterChars = [' ', '\t', '\n']; public static int CountWordInText(string text) { if (string.IsNullOrWhiteSpace(text)) { return 0; } string[] wordList = text.Split(DelimiterChars, StringSplitOptions.RemoveEmptyEntries); return wordList.Count(s => !s.Trim().All(static c => SpecialChars.Contains(c))); } } public static class WordCounter_Optimized2 { private static readonly System.Buffers.SearchValues SpecialChars = System.Buffers.SearchValues.Create(".?!;:,()[]"); private static readonly char[] DelimiterChars = [' ', '\t', '\n']; public static int CountWordInText(string text) { if (string.IsNullOrWhiteSpace(text)) { return 0; } string[] wordList = text.Split(DelimiterChars, StringSplitOptions.RemoveEmptyEntries); return wordList.Count(static s => !s.Trim().All(static c => SpecialChars.Contains(c))); } } public static class WordCounter_Optimized3 { public static int CountWordInText(ReadOnlySpan text) { var count = 0; var isWordProcessing = false; for (int i = 0; i < text.Length; ++i) { var ch = text[i]; switch (ch) { // Word separator chars case '\t': // \u0009 case '\n': // \u000A case '\r': // \u000D case ' ': // \u0020 IncrementCountIfProcessingWord(ref count, ref isWordProcessing); continue; // Skip special chars case '!': // \u0021 case '(': // \u0028 case ')': // \u0029 case ',': // \u002C case '.': // \u002E case ':': // \u003A case ';': // \u003B case '?': // \u003F case '[': // \u005B case ']': // \u005D continue; default: isWordProcessing = true; continue; } } // Handle remaining processing word. IncrementCountIfProcessingWord(ref count, ref isWordProcessing); return count; // Increment word count if processing word. [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)] static void IncrementCountIfProcessingWord(ref int count, ref bool isProcessingWord) { if (isProcessingWord) { isProcessingWord = false; ++count; } } } } ```
lahma commented 3 days ago

That implementation @filzrev did is probably worth a PR, great work with benchmarks and the implementation! I was afraid to change the "readable" behavior of the current implementation as I wasn't sure about test coverage.

A separate benchmark project could also be a good addition to the docfx solution to host experiments.