corvus-dotnet / Corvus.Globbing

A zero allocation globbing library
Apache License 2.0
16 stars 1 forks source link

Would making `GlobToken` smaller help? #7

Open idg10 opened 2 years ago

idg10 commented 2 years ago

The GlobToken struct currently has three int-sized properties, meaning that on a 64-bit system, alignment rules mean it will typically be the size of 4 ints, i.e. 16 bytes.

This is not massive, of course, but in tight loops that execute while churning through large quantities of data, byte shaving can start to have a measurable effect. There are a couple of reasons for this. One is simply that the number of cache lines required to maintain the working state of the inner loop goes down, meaning that if cache evictions of important state occur (CPUs don't necessarily maintain a MRU list for cache lines, so badly-chosen discards do happen) the amount of work it takes to recover goes down. Another is that if data structures get smaller, the JIT compiler can sometimes get more clever using registers, or even with SIMD code generation.

There are currently only 9 distinct values for GlobTokenType, making the 32 bits of space it currently gets in this structure seem like overengineering. And the use of int properties means that we support glob pattern lengths of up to 2 billion characters, with individual segments also able to be 2 billion characters long.

One relatively simple change we could try would be to flip the source of truth for Length and End. Currently End is the source of truth, and Length is calculated, but we could reverse this, and then make Length a ushort. This would limit individual segments to 65535 characters, while still allowing the whole pattern to be as long as strings can get. Combining this with a change to make GlobTokenType use ushort as its backing type would mean that GlobToken would now be only 64 bits in size. This would enable it to fit inside a single CPU register, which might open up more optimization options to the JIT, and would in any event half the size of the tokenized pattern.

I have no idea whether this would make the slightest difference to anything. It might even make things worse, by forcing the use of 16-bit arithmetic. But it would be worth an experiment to see if it does have any impact

Note to benchmark this effectively, it might be necessary to run some tests with much larger volumes of input than we are currently using. We want to know what happens when ploughing through enough data to start causing cache evictions.

mwadams commented 2 years ago

It is defintely true that it gets worse if you make Length a non-calculated value (i.e. caching it in the GlobToken is more expensive than a smaller GlobToken and calculating it). And you are right - we are in the realm of alignment issues, 16 bit arithmetic and caches.

Am I right in thinking it is a 64k L1 cache still? So our 10k strings may all be small enough to fit in the cache? We might want to ensure we are testing both the all-in-cache and guaranteed-not-to-fit-in-cache cases?

HowardvanRooijen commented 2 years ago

Is it worth optimising for the claims use case? As this is our primary reason for building this library?

idg10 commented 2 years ago

Cache sizes vary a lot. My i9-9900K reports:

L1: 512KB L2: 2.0MB L3: 16.0MB

But the thing I think we need to be careful of is having a benchmarking setup that repeatedly loops around parsing the same thing over and over again, meaning that the cache is probably not seeing many evictions at all. Our real-life use case could well look a lot more like:

  1. do loads of stuff that goes nowhere near this globbing code, and which touches data other than the text that will be searched for glob matches
  2. do some work that causes the text being inspected to end up in the cache
  3. look for a handful of glob matches
  4. repeat

This broader context is interesting because step 1 will tend to pepper the cache, or, depending on cache sizes and the workload you've got, it may even completely evict everything glob-related from the cache. This could have very different performance characteristics from the "run the same glob over the same input millions of times" case that benchmarks can end up measuring.

(I've even seen weird examples where the "recalculate vs cache" tradeoff for an example involving square roots started off faster if you cached, but then for certain sizes of workload was actually faster if you recomputed every time, but then for larger workloads still, caching again looked better!)

This of course it what makes it hard to pick the right microbenchmarks... Ultimately, it is the performance in situ (e.g., @HowardvanRooijen 's suggested Claims scenario) that matters.

But for now I'm just thinking of trying it out to see what happens.

idg10 commented 2 years ago

I did a preliminary experiment. This is a bit rough because I wasn't diligent about ensuring nothing else was happening on the machine at the time. I was looking to get a rough picture at this stage.

I tried adjusting GlobToken so that it was 8 bytes long, and here's what I saw:

It's possible that in some cases this change actually slows things down a bit. The p?th/(...)].txt [21] pattern against a single match looked about 2% slower in the BaselineRegexCompileAndMatchTrueBenchmarks. However, the confidence intervals overlap, so this could just be experimental noise, particularly since testing the same pattern in the BaselineRegexCompileAndMatchFalseBenchmarks looks faster by about the same amount (again, with overlapping confidence intervals). It's plausible that it would be slower in some cases. The change here improves efficiency of memory usage in the GlobToken array, but it does so at the cost of requiring some additional computational work. (Specifically, the CPU will need to perform some 16-bit to 32-bit extensions before performing some of the arithmetic.) So you might expect that when looking in isolation at an example that uses very little memory, you'd take the CPU cost hit while apparently gaining nothing from the more efficient memory use.

The expected gains from this would come in two ways:

  1. better performance when measured in isolation against data volumes large enough to cause cache spills
  2. reduced collateral damage

That last one is the most subtle, the one that is hardest to capture in microbenchmarks, but which is likely in practice to be the single biggest effect in this case. The issue here is that when code runs, it will inevitably have some sort of effect on the CPU cache. The first time it runs, it won't yet have been in the cache, so something else needs to be evicted to make space for it; in the process of accessing data while in operation, it will also initially push some other things out of the cache. And in cases where the overall workload of the whole machine on which this is running means that this code and the data it uses aren't permanently resident in L1 cache (and that would be the expected case for most realistic scenarios), an important question is: how much stuff does this evict from the cache to make room for its own workings?

The reason that question is important is that it determines the extent to which this code continues to have a negative impact even after it has finished running. In an extreme case, imagine a function that is so memory-intensive that it effectively commandeers the entire CPU cache. Once its work completes, the rest of the system will be running in slow motion for some time, as it fetches everything it needs back into the cache. Compare this with a function that is so efficient that in the worst case it requires only a single cache line. (This is as good as it can possibly get, ignoring cases where code was optimized entirely out of existence at compile time.) Once such a function returns, the rest of the system will carry on much as before, with almost everything that was required still held in the cache.

In short, memory efficiency's impact on the recovery time can have a larger impact on overall performance than any impact on the actual operation being examined.

The fact that we see essentially no change for very small amounts of data, but a noticeable amount of change for larger volumes suggests that we are having a positive impact on memory efficiency, which gives good reason to think that there might be an effective reduction in the collateral damage this code inflicts on the CPU cache.

Finally, if we're going to reduce the number of bits available to store the length, we need to detect when we've got a pattern we can't represent. (Since I was using short, that would mean a pattern with one or more segments longer than 32,767 characters; changing to ushort would expand that to 65,535, but obviously doesn't make the problem go away.) A GlobToken's length is calculated from the input start and end arguments, and the simple, safe way to code this would be checked((short)((end - start) + 1)). The perf discussions above are for the more dangerous (short)((end - start) + 1), which simply won't notice when a token gets too big, and will fail in weird ways. The numbers below are reported for both checked and unchecked. Use of checked makes a small but measurable different to the performance. The reason I've been discussing perf comparisons for the unchecked case is that it's possible to push these checks up. For example, we could just choose to reject any pattern longer than 65,535 characters. If we go forward with this sort of optimization, we'd need to think carefully about whether a 64K pattern length is a reasonable limit, and whether we want to take a perf hit to support bigger globs.

BaselineRegexCompileAndMatchTrueBenchmarks

Before (12 byte GlobToken)

Method NumberOfMatches Pattern Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Allocated
Compiled_Regex_IsMatch 1 p?th/(...)].txt [21] 95,795.19 ns 600.229 ns 532.088 ns 980.41 7.72 2.6855 1.3428 22,656 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [21] 975.50 ns 6.418 ns 6.003 ns 9.99 0.09 0.2270 - 1,904 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [21] 97.71 ns 1.025 ns 0.909 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/(...)].txt [46] 110,283.52 ns 853.785 ns 798.631 ns 686.65 6.19 3.4180 1.7090 29,590 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [46] 1,461.75 ns 26.424 ns 24.717 ns 9.10 0.17 0.2823 - 2,376 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [46] 160.62 ns 1.202 ns 1.125 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/a[e-g].txt 93,500.78 ns 717.671 ns 636.196 ns 1,152.16 14.31 2.3193 1.0986 19,799 B
DotNetGlob_IsMatch 1 p?th/a[e-g].txt 713.02 ns 7.484 ns 6.634 ns 8.79 0.08 0.1659 - 1,392 B
CorvusGlob_IsMatch 1 p?th/a[e-g].txt 81.16 ns 0.967 ns 0.857 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [21] 299,482.82 ns 3,221.093 ns 2,855.415 ns 3.00 0.03 2.4414 0.9766 22,643 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [21] 98,222.27 ns 366.661 ns 325.036 ns 0.98 0.01 0.1221 - 1,904 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [21] 99,853.09 ns 718.093 ns 671.705 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [46] 316,112.07 ns 2,256.149 ns 2,110.403 ns 3.19 0.06 3.4180 1.4648 29,582 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [46] 104,588.66 ns 636.729 ns 595.596 ns 1.06 0.02 0.2441 - 2,376 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [46] 99,063.21 ns 1,766.275 ns 1,652.175 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/a[e-g].txt 297,086.71 ns 2,961.690 ns 2,770.366 ns 2.89 0.04 1.9531 0.9766 19,784 B
DotNetGlob_IsMatch 10000 p?th/a[e-g].txt 103,599.55 ns 1,762.736 ns 2,583.796 ns 1.01 0.03 0.1221 - 1,392 B
CorvusGlob_IsMatch 10000 p?th/a[e-g].txt 102,617.50 ns 1,154.382 ns 963.961 ns 1.00 0.00 - - -

After (8-byte GlobToken)

Method NumberOfMatches Pattern Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Allocated
Compiled_Regex_IsMatch 1 p?th/(...)].txt [21] 94,592.59 ns 1,723.059 ns 1,527.447 ns 952.69 17.57 2.6855 1.3428 22,656 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [21] 982.58 ns 19.179 ns 22.087 ns 9.83 0.27 0.2270 - 1,904 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [21] 99.30 ns 1.117 ns 0.990 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/(...)].txt [46] 109,174.90 ns 1,063.767 ns 995.048 ns 704.87 8.78 3.4180 1.7090 29,590 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [46] 1,464.30 ns 22.986 ns 21.502 ns 9.46 0.09 0.2823 - 2,376 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [46] 154.79 ns 1.435 ns 1.272 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/a[e-g].txt 89,910.10 ns 1,107.855 ns 925.109 ns 1,175.72 14.21 2.3193 1.0986 19,799 B
DotNetGlob_IsMatch 1 p?th/a[e-g].txt 714.19 ns 8.633 ns 8.075 ns 9.35 0.14 0.1659 - 1,392 B
CorvusGlob_IsMatch 1 p?th/a[e-g].txt 76.39 ns 0.673 ns 0.630 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [21] 298,675.29 ns 3,117.943 ns 2,763.976 ns 3.25 0.03 2.4414 0.9766 22,643 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [21] 98,467.03 ns 740.750 ns 656.656 ns 1.07 0.01 0.1221 - 1,904 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [21] 91,787.58 ns 301.744 ns 282.251 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [46] 312,639.78 ns 2,415.470 ns 2,017.027 ns 3.27 0.04 3.4180 1.4648 29,582 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [46] 102,982.83 ns 364.905 ns 341.332 ns 1.08 0.01 0.2441 - 2,376 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [46] 95,750.15 ns 964.061 ns 805.034 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/a[e-g].txt 295,169.25 ns 1,962.006 ns 1,739.267 ns 3.21 0.03 1.9531 0.9766 19,784 B
DotNetGlob_IsMatch 10000 p?th/a[e-g].txt 101,240.27 ns 1,225.350 ns 1,146.193 ns 1.10 0.01 0.1221 - 1,392 B
CorvusGlob_IsMatch 10000 p?th/a[e-g].txt 91,893.37 ns 759.772 ns 710.691 ns 1.00 0.00 - - -

8-byte GlobToken using checked in constructor

Method NumberOfMatches Pattern Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Allocated
Compiled_Regex_IsMatch 1 p?th/(...)].txt [21] 96,687.61 ns 1,898.613 ns 3,703.098 ns 999.89 47.53 2.6855 1.3428 22,656 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [21] 961.02 ns 11.029 ns 10.316 ns 9.57 0.13 0.2270 - 1,904 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [21] 100.42 ns 1.099 ns 1.028 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/(...)].txt [46] 109,649.50 ns 1,627.249 ns 1,522.130 ns 664.28 11.59 3.4180 1.7090 29,590 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [46] 1,461.49 ns 14.080 ns 12.482 ns 8.84 0.12 0.2823 - 2,376 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [46] 165.46 ns 2.755 ns 2.151 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/a[e-g].txt 92,031.70 ns 1,467.580 ns 1,802.320 ns 1,172.11 29.51 2.3193 1.0986 19,799 B
DotNetGlob_IsMatch 1 p?th/a[e-g].txt 712.60 ns 9.156 ns 8.116 ns 9.04 0.11 0.1659 - 1,392 B
CorvusGlob_IsMatch 1 p?th/a[e-g].txt 78.79 ns 0.510 ns 0.426 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [21] 298,107.45 ns 3,382.713 ns 2,998.687 ns 2.88 0.03 2.4414 0.9766 22,643 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [21] 99,287.46 ns 1,722.170 ns 1,438.090 ns 0.96 0.01 0.1221 - 1,904 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [21] 103,657.94 ns 565.973 ns 472.613 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [46] 312,567.33 ns 3,207.218 ns 3,000.034 ns 3.22 0.05 3.4180 1.4648 29,582 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [46] 105,099.14 ns 1,174.698 ns 1,041.339 ns 1.08 0.02 0.2441 - 2,376 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [46] 97,189.91 ns 1,194.456 ns 1,117.295 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/a[e-g].txt 295,026.18 ns 2,900.614 ns 2,713.237 ns 3.01 0.03 1.9531 0.9766 19,784 B
DotNetGlob_IsMatch 10000 p?th/a[e-g].txt 99,972.78 ns 586.234 ns 519.681 ns 1.02 0.01 0.1221 - 1,392 B
CorvusGlob_IsMatch 10000 p?th/a[e-g].txt 97,965.08 ns 580.262 ns 514.387 ns 1.00 0.00 - - -

BaselineRegexCompileAndMatchFalseBenchmarks

Before (12 byte GlobToken):

Method NumberOfMatches Pattern Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Allocated
Compiled_Regex_IsMatch 1 p?th/(...)].txt [21] 94,174.20 ns 1,199.785 ns 1,001.875 ns 948.31 10.52 2.6855 1.3428 22,656 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [21] 941.84 ns 18.107 ns 15.121 ns 9.48 0.13 0.2270 - 1,904 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [21] 99.41 ns 0.889 ns 0.788 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/(...)].txt [46] 108,338.76 ns 996.372 ns 932.007 ns 716.33 4.64 3.4180 1.7090 29,590 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [46] 1,446.65 ns 26.155 ns 23.186 ns 9.54 0.16 0.2823 - 2,376 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [46] 151.44 ns 0.565 ns 0.472 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/a[e-g].txt 89,700.41 ns 1,014.611 ns 899.427 ns 1,130.51 12.58 2.3193 1.0986 19,799 B
DotNetGlob_IsMatch 1 p?th/a[e-g].txt 711.48 ns 8.265 ns 7.327 ns 8.97 0.10 0.1659 - 1,392 B
CorvusGlob_IsMatch 1 p?th/a[e-g].txt 79.33 ns 0.520 ns 0.486 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [21] 295,346.64 ns 2,122.020 ns 1,881.115 ns 3.28 0.03 2.4414 0.9766 22,643 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [21] 98,431.99 ns 238.696 ns 211.598 ns 1.09 0.00 0.1221 - 1,904 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [21] 90,156.42 ns 351.557 ns 311.646 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [46] 313,492.40 ns 2,189.189 ns 2,047.768 ns 3.00 0.03 3.4180 1.4648 29,582 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [46] 103,731.02 ns 286.383 ns 223.589 ns 0.99 0.00 0.2441 - 2,376 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [46] 104,633.11 ns 352.964 ns 294.741 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/a[e-g].txt 293,547.58 ns 2,001.637 ns 1,872.333 ns 3.42 0.03 1.9531 0.9766 19,784 B
DotNetGlob_IsMatch 10000 p?th/a[e-g].txt 100,390.66 ns 727.080 ns 680.112 ns 1.17 0.01 0.1221 - 1,392 B
CorvusGlob_IsMatch 10000 p?th/a[e-g].txt 85,816.24 ns 344.880 ns 287.990 ns 1.00 0.00 - - -

After: (8 byte GlobToken)

Method NumberOfMatches Pattern Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Allocated
Compiled_Regex_IsMatch 1 p?th/(...)].txt [21] 93,831.48 ns 821.521 ns 768.452 ns 958.26 9.38 2.6855 1.3428 22,656 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [21] 957.20 ns 6.269 ns 5.557 ns 9.78 0.07 0.2270 - 1,904 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [21] 97.85 ns 0.423 ns 0.375 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/(...)].txt [46] 107,852.37 ns 686.971 ns 573.652 ns 699.75 5.16 3.4180 1.7090 29,590 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [46] 1,482.14 ns 21.211 ns 19.841 ns 9.61 0.12 0.2823 - 2,376 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [46] 154.20 ns 0.791 ns 0.740 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/a[e-g].txt 90,233.37 ns 1,311.724 ns 1,226.987 ns 1,180.04 19.95 2.3193 1.0986 19,799 B
DotNetGlob_IsMatch 1 p?th/a[e-g].txt 719.24 ns 10.262 ns 9.599 ns 9.41 0.20 0.1659 - 1,392 B
CorvusGlob_IsMatch 1 p?th/a[e-g].txt 76.47 ns 0.957 ns 0.895 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [21] 299,515.26 ns 3,457.838 ns 3,234.464 ns 3.24 0.04 2.4414 0.9766 22,643 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [21] 99,624.34 ns 859.323 ns 803.812 ns 1.08 0.01 0.1221 - 1,904 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [21] 92,458.55 ns 915.452 ns 811.525 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [46] 313,737.50 ns 2,605.794 ns 2,437.461 ns 3.27 0.04 3.4180 1.4648 29,582 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [46] 104,991.37 ns 898.791 ns 840.729 ns 1.09 0.01 0.2441 - 2,376 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [46] 95,917.34 ns 524.277 ns 437.795 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/a[e-g].txt 295,231.08 ns 3,635.725 ns 3,222.976 ns 3.22 0.03 1.9531 0.9766 19,784 B
DotNetGlob_IsMatch 10000 p?th/a[e-g].txt 101,529.01 ns 684.414 ns 606.715 ns 1.11 0.01 0.1221 - 1,392 B
CorvusGlob_IsMatch 10000 p?th/a[e-g].txt 91,612.27 ns 427.833 ns 379.263 ns 1.00 0.00 - - -

8-byte GlobToken using checked in constructor

Method NumberOfMatches Pattern Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Allocated
Compiled_Regex_IsMatch 1 p?th/(...)].txt [21] 93,955.38 ns 1,477.181 ns 1,381.756 ns 940.68 14.82 2.6855 1.3428 22,656 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [21] 965.40 ns 12.456 ns 11.042 ns 9.67 0.13 0.2270 - 1,904 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [21] 99.87 ns 0.795 ns 0.705 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/(...)].txt [46] 109,556.31 ns 1,342.473 ns 1,121.025 ns 671.07 7.47 3.4180 1.7090 29,590 B
DotNetGlob_IsMatch 1 p?th/(...)].txt [46] 1,465.73 ns 19.987 ns 17.718 ns 8.96 0.11 0.2823 - 2,376 B
CorvusGlob_IsMatch 1 p?th/(...)].txt [46] 163.29 ns 0.393 ns 0.307 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 1 p?th/a[e-g].txt 91,416.59 ns 1,429.424 ns 1,337.085 ns 1,161.26 19.49 2.3193 1.0986 19,799 B
DotNetGlob_IsMatch 1 p?th/a[e-g].txt 702.86 ns 7.549 ns 6.692 ns 8.91 0.09 0.1659 - 1,392 B
CorvusGlob_IsMatch 1 p?th/a[e-g].txt 78.82 ns 0.458 ns 0.383 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [21] 297,509.87 ns 2,717.840 ns 2,409.295 ns 2.90 0.05 2.4414 0.9766 22,643 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [21] 99,053.46 ns 635.473 ns 594.422 ns 0.97 0.01 0.1221 - 1,904 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [21] 102,601.60 ns 1,431.550 ns 1,339.073 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/(...)].txt [46] 313,212.86 ns 3,045.308 ns 2,699.586 ns 3.24 0.04 3.4180 1.4648 29,582 B
DotNetGlob_IsMatch 10000 p?th/(...)].txt [46] 103,700.68 ns 685.337 ns 641.065 ns 1.07 0.01 0.2441 - 2,376 B
CorvusGlob_IsMatch 10000 p?th/(...)].txt [46] 96,674.95 ns 838.503 ns 743.311 ns 1.00 0.00 - - -
Compiled_Regex_IsMatch 10000 p?th/a[e-g].txt 296,002.27 ns 2,809.514 ns 2,490.561 ns 3.02 0.03 1.9531 0.9766 19,784 B
DotNetGlob_IsMatch 10000 p?th/a[e-g].txt 102,155.39 ns 1,268.859 ns 1,604.701 ns 1.04 0.02 0.1221 - 1,392 B
CorvusGlob_IsMatch 10000 p?th/a[e-g].txt 98,313.06 ns 1,139.488 ns 1,065.878 ns 1.00 0.00 - - -
mwadams commented 1 year ago

@idg10 Just to bring this alive again - are we happy that the 8-byte glob token is the way to go? And if so, do you want to apply your changes and maybe discuss the issues in a blog post?

cc @Lmooney25

idg10 commented 1 year ago

Having read through my comment and measurements, I'm now definitely uncertain.

My concern is that this adds complication for an extremely small gain. The risks of that complication include:

The upside is so small that I'm not convinced it offsets the downside. It wasn't even possible to say for certain that there was an upside until we got to relatively large numbers of repeat applications of the same glob pattern. (It's definitely possible that the additional complexity increases the cost for simpler cases.)

I can't remember the specific use cases for this. Do we have reason to believe that this is featuring in critical loops somewhere?

mwadams commented 1 year ago

I'm looking at what it looks like if we reduce the max string length to 12bits (i.e. 4095 characters) which is fine for URIs as the max usable length is 2,048 characters, along with validation in the GlobToken constructor (which just checks the bits).

This allows us to put a packed value in a uint with 2x12 bits for the start and end, and 8 bits for the token type, along with unchecked unpacking (because we know we are safe from the constructor validation).

A couple of tweaks to the implementation to pull ensure the unpacking of those values occurs at a minimum, gives us measurable performance improvements.

mwadams commented 1 year ago

(I failed to answer this question earlier: I can't remember the specific use cases for this. Do we have reason to believe that this is featuring in critical loops somewhere?)

The most important use case for us is in permissions validation. We use glob matching of URIs to determine Allow/Deny permission in permission sets.

This is then used both in determining access to those resources, and in e.g. stripping linked/embedded resources from HAL documents based on those permissions.

This is another example where every request will do many such checks, and shaving cost out of this operation is another of those "cumulative savings" - especially under load.