Open ufcpp opened 6 years ago
While there's no question that switch
needs improvement, further and improved analysis of your specific case is likely needed.
For a start, you should ensure that the benchmarks are representative for your scenario. I doubt that random numbers reflect actual Unicode text.
Also, the binary search with array version is pretty poor, there are numerous improvements you could make there. Just a few examples:
lower + (upper - lower) / 2
can be (lower + upper) / 2
since there's no chance that addition will overflow in this particular case/ 2
into >> 1
. Or use uint
instead of int
. Signed division by 2 is more expensive than unsigned division.var r = ranges[middle];
should use ref
to avoid a value type copyI don't know if doing that is enough to catch up BinaryIf
. BinarySearch would be preferable to BinaryIf as it requires much less code and branches.
And it's rather interesting that BinaryIf is faster than Switch. The classic switch does use binary search but it also uses switch tables here and there. The fact that BinaryIf is faster would imply that the classic switch choice of using tables isn't that great. Though it all depends on the value distribution...
Interesting.
I am not sure that a lot can be done to help case uint i0 when 1536 <= i0 && i0 <= 1541:
- to morph that into binary search with range checks compiler would need to prove that it is indeed a range check and that comparisons are not producing or observing sideeffects and that ranges in different cases are not overlapping (if they are, then lexical order of cases matters), etc..
It would be much easier to emit efficient case 1536..1541:
since such syntax would be more precise about the intent and overlapping ranges could be just disallowed.
BTW, compiler should be able to detect "contiguous range of values" cases and emit them as range checks, so at least in theory the flat switch, even though unwieldy in source, should result in IL that employs range checks. I wonder if that really happens in this scenario...
FYI @gafter @khyperia
@mikedn
I've tried the suggested improvements: https://github.com/ufcpp/GraphemeSplitter/commit/4159c73f626e74ea5e27cf7d384a01de71a525b1
Method | Mean | Error | StdDev |
---|---|---|---|
ByBinarySearch | 3.349 ms | 0.0152 ms | 0.0135 ms |
BySwitch | 3.597 ms | 0.0136 ms | 0.0114 ms |
ByBinaryIf | 2.386 ms | 0.0124 ms | 0.0116 ms |
GetByBinarySearch
gets 20% faster than before and BySwitch
, but even then ByBinaryIf
is the fastest. The binary-search is faster than the classic switch because of range-based branch.
The benchmark data is not ideal but I don't want so much precision.
Ported from https://github.com/dotnet/csharplang/issues/198#issuecomment-340677919
TL; DR
I want the following
switch
to be optimized toO(log N)
.Background
I recently implemented a Unicode grapheme breaking algorithm. In this library, a
switch
statement generated from GraphemeBreakProperty table has about 20 thousand lines. Cascadingif
statement andcase
-when
versions of that is much shorter but 25 times slower than theswitch
.Measurement
I measured performance of:
if
switch
with expandedcase
sswitch
withwhen
clauses(My PC is Core i7-4790 3.60GHz.)
A result:
FYR
limit number
Measurement results with limiting number:
20 lines:
50 lines:
100 lines:
200 lines:
500 lines:
1000 lines:
executable size
I also measured executable sizes.
build script
Debug (
/o-
) build:Release (
/o+
) build:compilation time
In addition, it requires 50 seconds to compile the 20 thousand-line switch-case.