Open idg10 opened 2 years ago
It is defintely true that it gets worse if you make Length a non-calculated value (i.e. caching it in the GlobToken is more expensive than a smaller GlobToken and calculating it). And you are right - we are in the realm of alignment issues, 16 bit arithmetic and caches.
Am I right in thinking it is a 64k L1 cache still? So our 10k strings may all be small enough to fit in the cache? We might want to ensure we are testing both the all-in-cache and guaranteed-not-to-fit-in-cache cases?
Is it worth optimising for the claims use case? As this is our primary reason for building this library?
Cache sizes vary a lot. My i9-9900K reports:
L1: 512KB L2: 2.0MB L3: 16.0MB
But the thing I think we need to be careful of is having a benchmarking setup that repeatedly loops around parsing the same thing over and over again, meaning that the cache is probably not seeing many evictions at all. Our real-life use case could well look a lot more like:
This broader context is interesting because step 1 will tend to pepper the cache, or, depending on cache sizes and the workload you've got, it may even completely evict everything glob-related from the cache. This could have very different performance characteristics from the "run the same glob over the same input millions of times" case that benchmarks can end up measuring.
(I've even seen weird examples where the "recalculate vs cache" tradeoff for an example involving square roots started off faster if you cached, but then for certain sizes of workload was actually faster if you recomputed every time, but then for larger workloads still, caching again looked better!)
This of course it what makes it hard to pick the right microbenchmarks... Ultimately, it is the performance in situ (e.g., @HowardvanRooijen 's suggested Claims scenario) that matters.
But for now I'm just thinking of trying it out to see what happens.
I did a preliminary experiment. This is a bit rough because I wasn't diligent about ensuring nothing else was happening on the machine at the time. I was looking to get a rough picture at this stage.
I tried adjusting GlobToken
so that it was 8 bytes long, and here's what I saw:
NumberOfMatches
is 10,000
p?th/(...)].txt [21]
and p?th/a[e-g].txt
p?th/(...)].txt [46]
buried in experimental noisechecked
to detect overflow needs careIt's possible that in some cases this change actually slows things down a bit. The p?th/(...)].txt [21]
pattern against a single match looked about 2% slower in the BaselineRegexCompileAndMatchTrueBenchmarks
. However, the confidence intervals overlap, so this could just be experimental noise, particularly since testing the same pattern in the BaselineRegexCompileAndMatchFalseBenchmarks
looks faster by about the same amount (again, with overlapping confidence intervals). It's plausible that it would be slower in some cases. The change here improves efficiency of memory usage in the GlobToken
array, but it does so at the cost of requiring some additional computational work. (Specifically, the CPU will need to perform some 16-bit to 32-bit extensions before performing some of the arithmetic.) So you might expect that when looking in isolation at an example that uses very little memory, you'd take the CPU cost hit while apparently gaining nothing from the more efficient memory use.
The expected gains from this would come in two ways:
That last one is the most subtle, the one that is hardest to capture in microbenchmarks, but which is likely in practice to be the single biggest effect in this case. The issue here is that when code runs, it will inevitably have some sort of effect on the CPU cache. The first time it runs, it won't yet have been in the cache, so something else needs to be evicted to make space for it; in the process of accessing data while in operation, it will also initially push some other things out of the cache. And in cases where the overall workload of the whole machine on which this is running means that this code and the data it uses aren't permanently resident in L1 cache (and that would be the expected case for most realistic scenarios), an important question is: how much stuff does this evict from the cache to make room for its own workings?
The reason that question is important is that it determines the extent to which this code continues to have a negative impact even after it has finished running. In an extreme case, imagine a function that is so memory-intensive that it effectively commandeers the entire CPU cache. Once its work completes, the rest of the system will be running in slow motion for some time, as it fetches everything it needs back into the cache. Compare this with a function that is so efficient that in the worst case it requires only a single cache line. (This is as good as it can possibly get, ignoring cases where code was optimized entirely out of existence at compile time.) Once such a function returns, the rest of the system will carry on much as before, with almost everything that was required still held in the cache.
In short, memory efficiency's impact on the recovery time can have a larger impact on overall performance than any impact on the actual operation being examined.
The fact that we see essentially no change for very small amounts of data, but a noticeable amount of change for larger volumes suggests that we are having a positive impact on memory efficiency, which gives good reason to think that there might be an effective reduction in the collateral damage this code inflicts on the CPU cache.
Finally, if we're going to reduce the number of bits available to store the length, we need to detect when we've got a pattern we can't represent. (Since I was using short
, that would mean a pattern with one or more segments longer than 32,767 characters; changing to ushort
would expand that to 65,535, but obviously doesn't make the problem go away.) A GlobToken
's length is calculated from the input start
and end
arguments, and the simple, safe way to code this would be checked((short)((end - start) + 1))
. The perf discussions above are for the more dangerous (short)((end - start) + 1)
, which simply won't notice when a token gets too big, and will fail in weird ways. The numbers below are reported for both checked
and unchecked. Use of checked
makes a small but measurable different to the performance. The reason I've been discussing perf comparisons for the unchecked case is that it's possible to push these checks up. For example, we could just choose to reject any pattern longer than 65,535 characters. If we go forward with this sort of optimization, we'd need to think carefully about whether a 64K pattern length is a reasonable limit, and whether we want to take a perf hit to support bigger globs.
BaselineRegexCompileAndMatchTrueBenchmarks
GlobToken
)Method | NumberOfMatches | Pattern | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [21] | 95,795.19 ns | 600.229 ns | 532.088 ns | 980.41 | 7.72 | 2.6855 | 1.3428 | 22,656 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [21] | 975.50 ns | 6.418 ns | 6.003 ns | 9.99 | 0.09 | 0.2270 | - | 1,904 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [21] | 97.71 ns | 1.025 ns | 0.909 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [46] | 110,283.52 ns | 853.785 ns | 798.631 ns | 686.65 | 6.19 | 3.4180 | 1.7090 | 29,590 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [46] | 1,461.75 ns | 26.424 ns | 24.717 ns | 9.10 | 0.17 | 0.2823 | - | 2,376 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [46] | 160.62 ns | 1.202 ns | 1.125 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/a[e-g].txt | 93,500.78 ns | 717.671 ns | 636.196 ns | 1,152.16 | 14.31 | 2.3193 | 1.0986 | 19,799 B |
DotNetGlob_IsMatch | 1 | p?th/a[e-g].txt | 713.02 ns | 7.484 ns | 6.634 ns | 8.79 | 0.08 | 0.1659 | - | 1,392 B |
CorvusGlob_IsMatch | 1 | p?th/a[e-g].txt | 81.16 ns | 0.967 ns | 0.857 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [21] | 299,482.82 ns | 3,221.093 ns | 2,855.415 ns | 3.00 | 0.03 | 2.4414 | 0.9766 | 22,643 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 98,222.27 ns | 366.661 ns | 325.036 ns | 0.98 | 0.01 | 0.1221 | - | 1,904 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 99,853.09 ns | 718.093 ns | 671.705 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [46] | 316,112.07 ns | 2,256.149 ns | 2,110.403 ns | 3.19 | 0.06 | 3.4180 | 1.4648 | 29,582 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 104,588.66 ns | 636.729 ns | 595.596 ns | 1.06 | 0.02 | 0.2441 | - | 2,376 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 99,063.21 ns | 1,766.275 ns | 1,652.175 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/a[e-g].txt | 297,086.71 ns | 2,961.690 ns | 2,770.366 ns | 2.89 | 0.04 | 1.9531 | 0.9766 | 19,784 B |
DotNetGlob_IsMatch | 10000 | p?th/a[e-g].txt | 103,599.55 ns | 1,762.736 ns | 2,583.796 ns | 1.01 | 0.03 | 0.1221 | - | 1,392 B |
CorvusGlob_IsMatch | 10000 | p?th/a[e-g].txt | 102,617.50 ns | 1,154.382 ns | 963.961 ns | 1.00 | 0.00 | - | - | - |
GlobToken
)Method | NumberOfMatches | Pattern | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [21] | 94,592.59 ns | 1,723.059 ns | 1,527.447 ns | 952.69 | 17.57 | 2.6855 | 1.3428 | 22,656 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [21] | 982.58 ns | 19.179 ns | 22.087 ns | 9.83 | 0.27 | 0.2270 | - | 1,904 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [21] | 99.30 ns | 1.117 ns | 0.990 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [46] | 109,174.90 ns | 1,063.767 ns | 995.048 ns | 704.87 | 8.78 | 3.4180 | 1.7090 | 29,590 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [46] | 1,464.30 ns | 22.986 ns | 21.502 ns | 9.46 | 0.09 | 0.2823 | - | 2,376 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [46] | 154.79 ns | 1.435 ns | 1.272 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/a[e-g].txt | 89,910.10 ns | 1,107.855 ns | 925.109 ns | 1,175.72 | 14.21 | 2.3193 | 1.0986 | 19,799 B |
DotNetGlob_IsMatch | 1 | p?th/a[e-g].txt | 714.19 ns | 8.633 ns | 8.075 ns | 9.35 | 0.14 | 0.1659 | - | 1,392 B |
CorvusGlob_IsMatch | 1 | p?th/a[e-g].txt | 76.39 ns | 0.673 ns | 0.630 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [21] | 298,675.29 ns | 3,117.943 ns | 2,763.976 ns | 3.25 | 0.03 | 2.4414 | 0.9766 | 22,643 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 98,467.03 ns | 740.750 ns | 656.656 ns | 1.07 | 0.01 | 0.1221 | - | 1,904 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 91,787.58 ns | 301.744 ns | 282.251 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [46] | 312,639.78 ns | 2,415.470 ns | 2,017.027 ns | 3.27 | 0.04 | 3.4180 | 1.4648 | 29,582 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 102,982.83 ns | 364.905 ns | 341.332 ns | 1.08 | 0.01 | 0.2441 | - | 2,376 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 95,750.15 ns | 964.061 ns | 805.034 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/a[e-g].txt | 295,169.25 ns | 1,962.006 ns | 1,739.267 ns | 3.21 | 0.03 | 1.9531 | 0.9766 | 19,784 B |
DotNetGlob_IsMatch | 10000 | p?th/a[e-g].txt | 101,240.27 ns | 1,225.350 ns | 1,146.193 ns | 1.10 | 0.01 | 0.1221 | - | 1,392 B |
CorvusGlob_IsMatch | 10000 | p?th/a[e-g].txt | 91,893.37 ns | 759.772 ns | 710.691 ns | 1.00 | 0.00 | - | - | - |
GlobToken
using checked
in constructorMethod | NumberOfMatches | Pattern | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [21] | 96,687.61 ns | 1,898.613 ns | 3,703.098 ns | 999.89 | 47.53 | 2.6855 | 1.3428 | 22,656 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [21] | 961.02 ns | 11.029 ns | 10.316 ns | 9.57 | 0.13 | 0.2270 | - | 1,904 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [21] | 100.42 ns | 1.099 ns | 1.028 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [46] | 109,649.50 ns | 1,627.249 ns | 1,522.130 ns | 664.28 | 11.59 | 3.4180 | 1.7090 | 29,590 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [46] | 1,461.49 ns | 14.080 ns | 12.482 ns | 8.84 | 0.12 | 0.2823 | - | 2,376 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [46] | 165.46 ns | 2.755 ns | 2.151 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/a[e-g].txt | 92,031.70 ns | 1,467.580 ns | 1,802.320 ns | 1,172.11 | 29.51 | 2.3193 | 1.0986 | 19,799 B |
DotNetGlob_IsMatch | 1 | p?th/a[e-g].txt | 712.60 ns | 9.156 ns | 8.116 ns | 9.04 | 0.11 | 0.1659 | - | 1,392 B |
CorvusGlob_IsMatch | 1 | p?th/a[e-g].txt | 78.79 ns | 0.510 ns | 0.426 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [21] | 298,107.45 ns | 3,382.713 ns | 2,998.687 ns | 2.88 | 0.03 | 2.4414 | 0.9766 | 22,643 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 99,287.46 ns | 1,722.170 ns | 1,438.090 ns | 0.96 | 0.01 | 0.1221 | - | 1,904 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 103,657.94 ns | 565.973 ns | 472.613 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [46] | 312,567.33 ns | 3,207.218 ns | 3,000.034 ns | 3.22 | 0.05 | 3.4180 | 1.4648 | 29,582 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 105,099.14 ns | 1,174.698 ns | 1,041.339 ns | 1.08 | 0.02 | 0.2441 | - | 2,376 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 97,189.91 ns | 1,194.456 ns | 1,117.295 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/a[e-g].txt | 295,026.18 ns | 2,900.614 ns | 2,713.237 ns | 3.01 | 0.03 | 1.9531 | 0.9766 | 19,784 B |
DotNetGlob_IsMatch | 10000 | p?th/a[e-g].txt | 99,972.78 ns | 586.234 ns | 519.681 ns | 1.02 | 0.01 | 0.1221 | - | 1,392 B |
CorvusGlob_IsMatch | 10000 | p?th/a[e-g].txt | 97,965.08 ns | 580.262 ns | 514.387 ns | 1.00 | 0.00 | - | - | - |
BaselineRegexCompileAndMatchFalseBenchmarks
Before (12 byte GlobToken
):
Method | NumberOfMatches | Pattern | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [21] | 94,174.20 ns | 1,199.785 ns | 1,001.875 ns | 948.31 | 10.52 | 2.6855 | 1.3428 | 22,656 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [21] | 941.84 ns | 18.107 ns | 15.121 ns | 9.48 | 0.13 | 0.2270 | - | 1,904 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [21] | 99.41 ns | 0.889 ns | 0.788 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [46] | 108,338.76 ns | 996.372 ns | 932.007 ns | 716.33 | 4.64 | 3.4180 | 1.7090 | 29,590 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [46] | 1,446.65 ns | 26.155 ns | 23.186 ns | 9.54 | 0.16 | 0.2823 | - | 2,376 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [46] | 151.44 ns | 0.565 ns | 0.472 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/a[e-g].txt | 89,700.41 ns | 1,014.611 ns | 899.427 ns | 1,130.51 | 12.58 | 2.3193 | 1.0986 | 19,799 B |
DotNetGlob_IsMatch | 1 | p?th/a[e-g].txt | 711.48 ns | 8.265 ns | 7.327 ns | 8.97 | 0.10 | 0.1659 | - | 1,392 B |
CorvusGlob_IsMatch | 1 | p?th/a[e-g].txt | 79.33 ns | 0.520 ns | 0.486 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [21] | 295,346.64 ns | 2,122.020 ns | 1,881.115 ns | 3.28 | 0.03 | 2.4414 | 0.9766 | 22,643 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 98,431.99 ns | 238.696 ns | 211.598 ns | 1.09 | 0.00 | 0.1221 | - | 1,904 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 90,156.42 ns | 351.557 ns | 311.646 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [46] | 313,492.40 ns | 2,189.189 ns | 2,047.768 ns | 3.00 | 0.03 | 3.4180 | 1.4648 | 29,582 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 103,731.02 ns | 286.383 ns | 223.589 ns | 0.99 | 0.00 | 0.2441 | - | 2,376 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 104,633.11 ns | 352.964 ns | 294.741 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/a[e-g].txt | 293,547.58 ns | 2,001.637 ns | 1,872.333 ns | 3.42 | 0.03 | 1.9531 | 0.9766 | 19,784 B |
DotNetGlob_IsMatch | 10000 | p?th/a[e-g].txt | 100,390.66 ns | 727.080 ns | 680.112 ns | 1.17 | 0.01 | 0.1221 | - | 1,392 B |
CorvusGlob_IsMatch | 10000 | p?th/a[e-g].txt | 85,816.24 ns | 344.880 ns | 287.990 ns | 1.00 | 0.00 | - | - | - |
GlobToken
)Method | NumberOfMatches | Pattern | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [21] | 93,831.48 ns | 821.521 ns | 768.452 ns | 958.26 | 9.38 | 2.6855 | 1.3428 | 22,656 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [21] | 957.20 ns | 6.269 ns | 5.557 ns | 9.78 | 0.07 | 0.2270 | - | 1,904 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [21] | 97.85 ns | 0.423 ns | 0.375 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [46] | 107,852.37 ns | 686.971 ns | 573.652 ns | 699.75 | 5.16 | 3.4180 | 1.7090 | 29,590 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [46] | 1,482.14 ns | 21.211 ns | 19.841 ns | 9.61 | 0.12 | 0.2823 | - | 2,376 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [46] | 154.20 ns | 0.791 ns | 0.740 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/a[e-g].txt | 90,233.37 ns | 1,311.724 ns | 1,226.987 ns | 1,180.04 | 19.95 | 2.3193 | 1.0986 | 19,799 B |
DotNetGlob_IsMatch | 1 | p?th/a[e-g].txt | 719.24 ns | 10.262 ns | 9.599 ns | 9.41 | 0.20 | 0.1659 | - | 1,392 B |
CorvusGlob_IsMatch | 1 | p?th/a[e-g].txt | 76.47 ns | 0.957 ns | 0.895 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [21] | 299,515.26 ns | 3,457.838 ns | 3,234.464 ns | 3.24 | 0.04 | 2.4414 | 0.9766 | 22,643 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 99,624.34 ns | 859.323 ns | 803.812 ns | 1.08 | 0.01 | 0.1221 | - | 1,904 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 92,458.55 ns | 915.452 ns | 811.525 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [46] | 313,737.50 ns | 2,605.794 ns | 2,437.461 ns | 3.27 | 0.04 | 3.4180 | 1.4648 | 29,582 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 104,991.37 ns | 898.791 ns | 840.729 ns | 1.09 | 0.01 | 0.2441 | - | 2,376 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 95,917.34 ns | 524.277 ns | 437.795 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/a[e-g].txt | 295,231.08 ns | 3,635.725 ns | 3,222.976 ns | 3.22 | 0.03 | 1.9531 | 0.9766 | 19,784 B |
DotNetGlob_IsMatch | 10000 | p?th/a[e-g].txt | 101,529.01 ns | 684.414 ns | 606.715 ns | 1.11 | 0.01 | 0.1221 | - | 1,392 B |
CorvusGlob_IsMatch | 10000 | p?th/a[e-g].txt | 91,612.27 ns | 427.833 ns | 379.263 ns | 1.00 | 0.00 | - | - | - |
GlobToken
using checked
in constructorMethod | NumberOfMatches | Pattern | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [21] | 93,955.38 ns | 1,477.181 ns | 1,381.756 ns | 940.68 | 14.82 | 2.6855 | 1.3428 | 22,656 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [21] | 965.40 ns | 12.456 ns | 11.042 ns | 9.67 | 0.13 | 0.2270 | - | 1,904 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [21] | 99.87 ns | 0.795 ns | 0.705 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/(...)].txt [46] | 109,556.31 ns | 1,342.473 ns | 1,121.025 ns | 671.07 | 7.47 | 3.4180 | 1.7090 | 29,590 B |
DotNetGlob_IsMatch | 1 | p?th/(...)].txt [46] | 1,465.73 ns | 19.987 ns | 17.718 ns | 8.96 | 0.11 | 0.2823 | - | 2,376 B |
CorvusGlob_IsMatch | 1 | p?th/(...)].txt [46] | 163.29 ns | 0.393 ns | 0.307 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 1 | p?th/a[e-g].txt | 91,416.59 ns | 1,429.424 ns | 1,337.085 ns | 1,161.26 | 19.49 | 2.3193 | 1.0986 | 19,799 B |
DotNetGlob_IsMatch | 1 | p?th/a[e-g].txt | 702.86 ns | 7.549 ns | 6.692 ns | 8.91 | 0.09 | 0.1659 | - | 1,392 B |
CorvusGlob_IsMatch | 1 | p?th/a[e-g].txt | 78.82 ns | 0.458 ns | 0.383 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [21] | 297,509.87 ns | 2,717.840 ns | 2,409.295 ns | 2.90 | 0.05 | 2.4414 | 0.9766 | 22,643 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 99,053.46 ns | 635.473 ns | 594.422 ns | 0.97 | 0.01 | 0.1221 | - | 1,904 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [21] | 102,601.60 ns | 1,431.550 ns | 1,339.073 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/(...)].txt [46] | 313,212.86 ns | 3,045.308 ns | 2,699.586 ns | 3.24 | 0.04 | 3.4180 | 1.4648 | 29,582 B |
DotNetGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 103,700.68 ns | 685.337 ns | 641.065 ns | 1.07 | 0.01 | 0.2441 | - | 2,376 B |
CorvusGlob_IsMatch | 10000 | p?th/(...)].txt [46] | 96,674.95 ns | 838.503 ns | 743.311 ns | 1.00 | 0.00 | - | - | - |
Compiled_Regex_IsMatch | 10000 | p?th/a[e-g].txt | 296,002.27 ns | 2,809.514 ns | 2,490.561 ns | 3.02 | 0.03 | 1.9531 | 0.9766 | 19,784 B |
DotNetGlob_IsMatch | 10000 | p?th/a[e-g].txt | 102,155.39 ns | 1,268.859 ns | 1,604.701 ns | 1.04 | 0.02 | 0.1221 | - | 1,392 B |
CorvusGlob_IsMatch | 10000 | p?th/a[e-g].txt | 98,313.06 ns | 1,139.488 ns | 1,065.878 ns | 1.00 | 0.00 | - | - | - |
@idg10 Just to bring this alive again - are we happy that the 8-byte glob token is the way to go? And if so, do you want to apply your changes and maybe discuss the issues in a blog post?
cc @Lmooney25
Having read through my comment and measurements, I'm now definitely uncertain.
My concern is that this adds complication for an extremely small gain. The risks of that complication include:
The upside is so small that I'm not convinced it offsets the downside. It wasn't even possible to say for certain that there was an upside until we got to relatively large numbers of repeat applications of the same glob pattern. (It's definitely possible that the additional complexity increases the cost for simpler cases.)
I can't remember the specific use cases for this. Do we have reason to believe that this is featuring in critical loops somewhere?
I'm looking at what it looks like if we reduce the max string length to 12bits (i.e. 4095 characters) which is fine for URIs as the max usable length is 2,048 characters, along with validation in the GlobToken
constructor (which just checks the bits).
This allows us to put a packed value in a uint with 2x12 bits for the start and end, and 8 bits for the token type, along with unchecked
unpacking (because we know we are safe from the constructor validation).
A couple of tweaks to the implementation to pull ensure the unpacking of those values occurs at a minimum, gives us measurable performance improvements.
(I failed to answer this question earlier: I can't remember the specific use cases for this. Do we have reason to believe that this is featuring in critical loops somewhere?
)
The most important use case for us is in permissions validation. We use glob matching of URIs to determine Allow/Deny permission in permission sets.
This is then used both in determining access to those resources, and in e.g. stripping linked/embedded resources from HAL documents based on those permissions.
This is another example where every request will do many such checks, and shaving cost out of this operation is another of those "cumulative savings" - especially under load.
The
GlobToken
struct
currently has threeint
-sized properties, meaning that on a 64-bit system, alignment rules mean it will typically be the size of 4ints
, i.e. 16 bytes.This is not massive, of course, but in tight loops that execute while churning through large quantities of data, byte shaving can start to have a measurable effect. There are a couple of reasons for this. One is simply that the number of cache lines required to maintain the working state of the inner loop goes down, meaning that if cache evictions of important state occur (CPUs don't necessarily maintain a MRU list for cache lines, so badly-chosen discards do happen) the amount of work it takes to recover goes down. Another is that if data structures get smaller, the JIT compiler can sometimes get more clever using registers, or even with SIMD code generation.
There are currently only 9 distinct values for
GlobTokenType
, making the 32 bits of space it currently gets in this structure seem like overengineering. And the use ofint
properties means that we support glob pattern lengths of up to 2 billion characters, with individual segments also able to be 2 billion characters long.One relatively simple change we could try would be to flip the source of truth for
Length
andEnd
. CurrentlyEnd
is the source of truth, andLength
is calculated, but we could reverse this, and then makeLength
aushort
. This would limit individual segments to 65535 characters, while still allowing the whole pattern to be as long as strings can get. Combining this with a change to makeGlobTokenType
useushort
as its backing type would mean thatGlobToken
would now be only 64 bits in size. This would enable it to fit inside a single CPU register, which might open up more optimization options to the JIT, and would in any event half the size of the tokenized pattern.I have no idea whether this would make the slightest difference to anything. It might even make things worse, by forcing the use of 16-bit arithmetic. But it would be worth an experiment to see if it does have any impact
Note to benchmark this effectively, it might be necessary to run some tests with much larger volumes of input than we are currently using. We want to know what happens when ploughing through enough data to start causing cache evictions.