Optimization for range-based switch

ufcpp commented 6 years ago

Ported from https://github.com/dotnet/csharplang/issues/198#issuecomment-340677919

TL; DR

I want the following switch to be optimized to O(log N).

switch (codePoint)
{
    case uint i0 when 1536 <= i0 && i0 <= 1541:
    case 1757:
    case 1807:
    case 2274:
    case 3406:
    case 69821:
    case uint i1 when 70082 <= i1 && i1 <= 70083:

Background

I recently implemented a Unicode grapheme breaking algorithm. In this library, a switch statement generated from GraphemeBreakProperty table has about 20 thousand lines. Cascading if statement and case-when versions of that is much shorter but 25 times slower than the switch.

// expanded cases (fast, but ugly and too long to compile)
            switch (codePoint)
            {
                case 1536:
                case 1537:
                case 1538:
                case 1539:
                case 1540:
                case 1541:
                // ...

// with case-when clause (simple, but much slower)
            switch (codePoint)
            {
                case uint i0 when 1536 <= i0 && i0 <= 1541:
                // ...

// with range (I want, proposed as "slicing"?)
            switch (codePoint)
            {
                case 1536..1541:
                // ...

Measurement

I measured performance of:

(My PC is Core i7-4790 3.60GHz.)

A result:

Method	Mean	Error	StdDev
ByBinarySearch	4.642 ms	0.0468 ms	0.0415 ms
BySwitch	3.825 ms	0.0473 ms	0.0442 ms
BySwitchWhen	115.310 ms	2.7970 ms	2.9928 ms
ByIf	142.658 ms	1.3744 ms	1.1477 ms
ByBinaryIf	2.502 ms	0.0147 ms	0.0137 ms

FYR

limit number

Measurement results with limiting number:

20 lines:

Method	Mean	Error	StdDev
ByBinarySearch	2.363 ms	0.0266 ms	0.0222 ms
BySwitch	1.262 ms	0.0105 ms	0.0098 ms
BySwitchWhen	1.809 ms	0.0106 ms	0.0099 ms
ByIf	1.807 ms	0.0070 ms	0.0062 ms
ByBinaryIf	1.398 ms	0.0075 ms	0.0070 ms

50 lines:

Method	Mean	Error	StdDev
ByBinarySearch	3.030 ms	0.0224 ms	0.0210 ms
BySwitch	1.333 ms	0.0151 ms	0.0141 ms
BySwitchWhen	3.417 ms	0.0283 ms	0.0265 ms
ByIf	3.763 ms	0.0204 ms	0.0191 ms
ByBinaryIf	1.916 ms	0.0109 ms	0.0091 ms

100 lines:

Method	Mean	Error	StdDev
ByBinarySearch	3.216 ms	0.0135 ms	0.0120 ms
BySwitch	1.416 ms	0.0089 ms	0.0083 ms
BySwitchWhen	5.030 ms	0.0454 ms	0.0425 ms
ByIf	6.065 ms	0.0530 ms	0.0496 ms
ByBinaryIf	1.960 ms	0.0158 ms	0.0148 ms

200 lines:

Method	Mean	Error	StdDev
ByBinarySearch	3.446 ms	0.0186 ms	0.0174 ms
BySwitch	1.483 ms	0.0228 ms	0.0214 ms
BySwitchWhen	10.176 ms	0.0878 ms	0.0733 ms
ByIf	12.209 ms	0.1345 ms	0.1258 ms
ByBinaryIf	2.029 ms	0.0228 ms	0.0202 ms

500 lines:

Method	Mean	Error	StdDev
ByBinarySearch	3.636 ms	0.0256 ms	0.0240 ms
BySwitch	1.518 ms	0.0080 ms	0.0075 ms
BySwitchWhen	30.827 ms	0.1879 ms	0.1757 ms
ByIf	33.477 ms	0.2768 ms	0.2590 ms
ByBinaryIf	2.088 ms	0.0136 ms	0.0127 ms

1000 lines:

Method	Mean	Error	StdDev
ByBinarySearch	3.944 ms	0.0433 ms	0.0405 ms
BySwitch	3.535 ms	0.0393 ms	0.0368 ms
BySwitchWhen	36.672 ms	0.2641 ms	0.2205 ms
ByIf	69.189 ms	0.2750 ms	0.2572 ms
ByBinaryIf	2.316 ms	0.0217 ms	0.0203 ms

executable size

I also measured executable sizes.

build script

Debug (/o-) build:

Name	Size (bytes)
BinarySearchIf.dll	74240
LinearIf.dll	54272
Switch.dll	140800
SwitchWhen.dll	48128

Release (/o+) build:

Name	Size (bytes)
BinarySearchIf.dll	31232
LinearIf.dll	24064
Switch.dll	140800
SwitchWhen.dll	38912

compilation time

In addition, it requires 50 seconds to compile the 20 thousand-line switch-case.

mikedn commented 6 years ago

While there's no question that switch needs improvement, further and improved analysis of your specific case is likely needed.

For a start, you should ensure that the benchmarks are representative for your scenario. I doubt that random numbers reflect actual Unicode text.

Also, the binary search with array version is pretty poor, there are numerous improvements you could make there. Just a few examples:

lower + (upper - lower) / 2 can be (lower + upper) / 2 since there's no chance that addition will overflow in this particular case
even better, change / 2 into >> 1. Or use uint instead of int. Signed division by 2 is more expensive than unsigned division.
var r = ranges[middle]; should use ref to avoid a value type copy
maybe use pointers to avoid an array bounds check
maybe use a better "middle" for the start

I don't know if doing that is enough to catch up BinaryIf. BinarySearch would be preferable to BinaryIf as it requires much less code and branches.

And it's rather interesting that BinaryIf is faster than Switch. The classic switch does use binary search but it also uses switch tables here and there. The fact that BinaryIf is faster would imply that the classic switch choice of using tables isn't that great. Though it all depends on the value distribution...

VSadov commented 6 years ago

Interesting.

I am not sure that a lot can be done to help case uint i0 when 1536 <= i0 && i0 <= 1541: - to morph that into binary search with range checks compiler would need to prove that it is indeed a range check and that comparisons are not producing or observing sideeffects and that ranges in different cases are not overlapping (if they are, then lexical order of cases matters), etc..

It would be much easier to emit efficient case 1536..1541: since such syntax would be more precise about the intent and overlapping ranges could be just disallowed.

BTW, compiler should be able to detect "contiguous range of values" cases and emit them as range checks, so at least in theory the flat switch, even though unwieldy in source, should result in IL that employs range checks. I wonder if that really happens in this scenario...

jcouv commented 6 years ago

FYI @gafter @khyperia

ufcpp commented 6 years ago

@mikedn

I've tried the suggested improvements: https://github.com/ufcpp/GraphemeSplitter/commit/4159c73f626e74ea5e27cf7d384a01de71a525b1

Method	Mean	Error	StdDev
ByBinarySearch	3.349 ms	0.0152 ms	0.0135 ms
BySwitch	3.597 ms	0.0136 ms	0.0114 ms
ByBinaryIf	2.386 ms	0.0124 ms	0.0116 ms

GetByBinarySearch gets 20% faster than before and BySwitch, but even then ByBinaryIf is the fastest. The binary-search is faster than the classic switch because of range-based branch.

The benchmark data is not ideal but I don't want so much precision.

dotnet / roslyn