Would making `GlobToken` smaller help?

idg10 commented 2 years ago

The GlobToken struct currently has three int-sized properties, meaning that on a 64-bit system, alignment rules mean it will typically be the size of 4 ints, i.e. 16 bytes.

This is not massive, of course, but in tight loops that execute while churning through large quantities of data, byte shaving can start to have a measurable effect. There are a couple of reasons for this. One is simply that the number of cache lines required to maintain the working state of the inner loop goes down, meaning that if cache evictions of important state occur (CPUs don't necessarily maintain a MRU list for cache lines, so badly-chosen discards do happen) the amount of work it takes to recover goes down. Another is that if data structures get smaller, the JIT compiler can sometimes get more clever using registers, or even with SIMD code generation.

There are currently only 9 distinct values for GlobTokenType, making the 32 bits of space it currently gets in this structure seem like overengineering. And the use of int properties means that we support glob pattern lengths of up to 2 billion characters, with individual segments also able to be 2 billion characters long.

One relatively simple change we could try would be to flip the source of truth for Length and End. Currently End is the source of truth, and Length is calculated, but we could reverse this, and then make Length a ushort. This would limit individual segments to 65535 characters, while still allowing the whole pattern to be as long as strings can get. Combining this with a change to make GlobTokenType use ushort as its backing type would mean that GlobToken would now be only 64 bits in size. This would enable it to fit inside a single CPU register, which might open up more optimization options to the JIT, and would in any event half the size of the tokenized pattern.

I have no idea whether this would make the slightest difference to anything. It might even make things worse, by forcing the use of 16-bit arithmetic. But it would be worth an experiment to see if it does have any impact

Note to benchmark this effectively, it might be necessary to run some tests with much larger volumes of input than we are currently using. We want to know what happens when ploughing through enough data to start causing cache evictions.

mwadams commented 2 years ago

It is defintely true that it gets worse if you make Length a non-calculated value (i.e. caching it in the GlobToken is more expensive than a smaller GlobToken and calculating it). And you are right - we are in the realm of alignment issues, 16 bit arithmetic and caches.

Am I right in thinking it is a 64k L1 cache still? So our 10k strings may all be small enough to fit in the cache? We might want to ensure we are testing both the all-in-cache and guaranteed-not-to-fit-in-cache cases?

HowardvanRooijen commented 2 years ago

Is it worth optimising for the claims use case? As this is our primary reason for building this library?

idg10 commented 2 years ago

Cache sizes vary a lot. My i9-9900K reports:

L1: 512KB L2: 2.0MB L3: 16.0MB

But the thing I think we need to be careful of is having a benchmarking setup that repeatedly loops around parsing the same thing over and over again, meaning that the cache is probably not seeing many evictions at all. Our real-life use case could well look a lot more like:

do loads of stuff that goes nowhere near this globbing code, and which touches data other than the text that will be searched for glob matches
do some work that causes the text being inspected to end up in the cache
look for a handful of glob matches
repeat

This broader context is interesting because step 1 will tend to pepper the cache, or, depending on cache sizes and the workload you've got, it may even completely evict everything glob-related from the cache. This could have very different performance characteristics from the "run the same glob over the same input millions of times" case that benchmarks can end up measuring.

(I've even seen weird examples where the "recalculate vs cache" tradeoff for an example involving square roots started off faster if you cached, but then for certain sizes of workload was actually faster if you recomputed every time, but then for larger workloads still, caching again looked better!)

This of course it what makes it hard to pick the right microbenchmarks... Ultimately, it is the performance in situ (e.g., @HowardvanRooijen 's suggested Claims scenario) that matters.

But for now I'm just thinking of trying it out to see what happens.

idg10 commented 2 years ago

I did a preliminary experiment. This is a bit rough because I wasn't diligent about ensuring nothing else was happening on the machine at the time. I was looking to get a rough picture at this stage.

I tried adjusting GlobToken so that it was 8 bytes long, and here's what I saw:

No change in memory allocation
Changes with short strings, 1 match uncertain due to experimental noise
Some improvements when NumberOfMatches is 10,000
- 5%-10% performance improvement for p?th/(...)].txt [21] and p?th/a[e-g].txt
- Change in p?th/(...)].txt [46] buried in experimental noise
Use of checked to detect overflow needs care

It's possible that in some cases this change actually slows things down a bit. The p?th/(...)].txt [21] pattern against a single match looked about 2% slower in the BaselineRegexCompileAndMatchTrueBenchmarks. However, the confidence intervals overlap, so this could just be experimental noise, particularly since testing the same pattern in the BaselineRegexCompileAndMatchFalseBenchmarks looks faster by about the same amount (again, with overlapping confidence intervals). It's plausible that it would be slower in some cases. The change here improves efficiency of memory usage in the GlobToken array, but it does so at the cost of requiring some additional computational work. (Specifically, the CPU will need to perform some 16-bit to 32-bit extensions before performing some of the arithmetic.) So you might expect that when looking in isolation at an example that uses very little memory, you'd take the CPU cost hit while apparently gaining nothing from the more efficient memory use.

The expected gains from this would come in two ways:

better performance when measured in isolation against data volumes large enough to cause cache spills
reduced collateral damage

That last one is the most subtle, the one that is hardest to capture in microbenchmarks, but which is likely in practice to be the single biggest effect in this case. The issue here is that when code runs, it will inevitably have some sort of effect on the CPU cache. The first time it runs, it won't yet have been in the cache, so something else needs to be evicted to make space for it; in the process of accessing data while in operation, it will also initially push some other things out of the cache. And in cases where the overall workload of the whole machine on which this is running means that this code and the data it uses aren't permanently resident in L1 cache (and that would be the expected case for most realistic scenarios), an important question is: how much stuff does this evict from the cache to make room for its own workings?

The reason that question is important is that it determines the extent to which this code continues to have a negative impact even after it has finished running. In an extreme case, imagine a function that is so memory-intensive that it effectively commandeers the entire CPU cache. Once its work completes, the rest of the system will be running in slow motion for some time, as it fetches everything it needs back into the cache. Compare this with a function that is so efficient that in the worst case it requires only a single cache line. (This is as good as it can possibly get, ignoring cases where code was optimized entirely out of existence at compile time.) Once such a function returns, the rest of the system will carry on much as before, with almost everything that was required still held in the cache.

In short, memory efficiency's impact on the recovery time can have a larger impact on overall performance than any impact on the actual operation being examined.

The fact that we see essentially no change for very small amounts of data, but a noticeable amount of change for larger volumes suggests that we are having a positive impact on memory efficiency, which gives good reason to think that there might be an effective reduction in the collateral damage this code inflicts on the CPU cache.

Finally, if we're going to reduce the number of bits available to store the length, we need to detect when we've got a pattern we can't represent. (Since I was using short, that would mean a pattern with one or more segments longer than 32,767 characters; changing to ushort would expand that to 65,535, but obviously doesn't make the problem go away.) A GlobToken's length is calculated from the input start and end arguments, and the simple, safe way to code this would be checked((short)((end - start) + 1)). The perf discussions above are for the more dangerous (short)((end - start) + 1), which simply won't notice when a token gets too big, and will fail in weird ways. The numbers below are reported for both checked and unchecked. Use of checked makes a small but measurable different to the performance. The reason I've been discussing perf comparisons for the unchecked case is that it's possible to push these checks up. For example, we could just choose to reject any pattern longer than 65,535 characters. If we go forward with this sort of optimization, we'd need to think carefully about whether a 64K pattern length is a reasonable limit, and whether we want to take a perf hit to support bigger globs.

`BaselineRegexCompileAndMatchTrueBenchmarks`

Before (12 byte `GlobToken`)

Method	NumberOfMatches	Pattern	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Allocated
Compiled_Regex_IsMatch	1	p?th/(...)].txt [21]	95,795.19 ns	600.229 ns	532.088 ns	980.41	7.72	2.6855	1.3428	22,656 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [21]	975.50 ns	6.418 ns	6.003 ns	9.99	0.09	0.2270	-	1,904 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [21]	97.71 ns	1.025 ns	0.909 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/(...)].txt [46]	110,283.52 ns	853.785 ns	798.631 ns	686.65	6.19	3.4180	1.7090	29,590 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [46]	1,461.75 ns	26.424 ns	24.717 ns	9.10	0.17	0.2823	-	2,376 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [46]	160.62 ns	1.202 ns	1.125 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/a[e-g].txt	93,500.78 ns	717.671 ns	636.196 ns	1,152.16	14.31	2.3193	1.0986	19,799 B
DotNetGlob_IsMatch	1	p?th/a[e-g].txt	713.02 ns	7.484 ns	6.634 ns	8.79	0.08	0.1659	-	1,392 B
CorvusGlob_IsMatch	1	p?th/a[e-g].txt	81.16 ns	0.967 ns	0.857 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [21]	299,482.82 ns	3,221.093 ns	2,855.415 ns	3.00	0.03	2.4414	0.9766	22,643 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [21]	98,222.27 ns	366.661 ns	325.036 ns	0.98	0.01	0.1221	-	1,904 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [21]	99,853.09 ns	718.093 ns	671.705 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [46]	316,112.07 ns	2,256.149 ns	2,110.403 ns	3.19	0.06	3.4180	1.4648	29,582 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [46]	104,588.66 ns	636.729 ns	595.596 ns	1.06	0.02	0.2441	-	2,376 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [46]	99,063.21 ns	1,766.275 ns	1,652.175 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/a[e-g].txt	297,086.71 ns	2,961.690 ns	2,770.366 ns	2.89	0.04	1.9531	0.9766	19,784 B
DotNetGlob_IsMatch	10000	p?th/a[e-g].txt	103,599.55 ns	1,762.736 ns	2,583.796 ns	1.01	0.03	0.1221	-	1,392 B
CorvusGlob_IsMatch	10000	p?th/a[e-g].txt	102,617.50 ns	1,154.382 ns	963.961 ns	1.00	0.00	-	-	-

After (8-byte `GlobToken`)

Method	NumberOfMatches	Pattern	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Allocated
Compiled_Regex_IsMatch	1	p?th/(...)].txt [21]	94,592.59 ns	1,723.059 ns	1,527.447 ns	952.69	17.57	2.6855	1.3428	22,656 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [21]	982.58 ns	19.179 ns	22.087 ns	9.83	0.27	0.2270	-	1,904 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [21]	99.30 ns	1.117 ns	0.990 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/(...)].txt [46]	109,174.90 ns	1,063.767 ns	995.048 ns	704.87	8.78	3.4180	1.7090	29,590 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [46]	1,464.30 ns	22.986 ns	21.502 ns	9.46	0.09	0.2823	-	2,376 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [46]	154.79 ns	1.435 ns	1.272 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/a[e-g].txt	89,910.10 ns	1,107.855 ns	925.109 ns	1,175.72	14.21	2.3193	1.0986	19,799 B
DotNetGlob_IsMatch	1	p?th/a[e-g].txt	714.19 ns	8.633 ns	8.075 ns	9.35	0.14	0.1659	-	1,392 B
CorvusGlob_IsMatch	1	p?th/a[e-g].txt	76.39 ns	0.673 ns	0.630 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [21]	298,675.29 ns	3,117.943 ns	2,763.976 ns	3.25	0.03	2.4414	0.9766	22,643 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [21]	98,467.03 ns	740.750 ns	656.656 ns	1.07	0.01	0.1221	-	1,904 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [21]	91,787.58 ns	301.744 ns	282.251 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [46]	312,639.78 ns	2,415.470 ns	2,017.027 ns	3.27	0.04	3.4180	1.4648	29,582 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [46]	102,982.83 ns	364.905 ns	341.332 ns	1.08	0.01	0.2441	-	2,376 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [46]	95,750.15 ns	964.061 ns	805.034 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/a[e-g].txt	295,169.25 ns	1,962.006 ns	1,739.267 ns	3.21	0.03	1.9531	0.9766	19,784 B
DotNetGlob_IsMatch	10000	p?th/a[e-g].txt	101,240.27 ns	1,225.350 ns	1,146.193 ns	1.10	0.01	0.1221	-	1,392 B
CorvusGlob_IsMatch	10000	p?th/a[e-g].txt	91,893.37 ns	759.772 ns	710.691 ns	1.00	0.00	-	-	-

8-byte `GlobToken` using `checked` in constructor

Method	NumberOfMatches	Pattern	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Allocated
Compiled_Regex_IsMatch	1	p?th/(...)].txt [21]	96,687.61 ns	1,898.613 ns	3,703.098 ns	999.89	47.53	2.6855	1.3428	22,656 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [21]	961.02 ns	11.029 ns	10.316 ns	9.57	0.13	0.2270	-	1,904 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [21]	100.42 ns	1.099 ns	1.028 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/(...)].txt [46]	109,649.50 ns	1,627.249 ns	1,522.130 ns	664.28	11.59	3.4180	1.7090	29,590 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [46]	1,461.49 ns	14.080 ns	12.482 ns	8.84	0.12	0.2823	-	2,376 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [46]	165.46 ns	2.755 ns	2.151 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/a[e-g].txt	92,031.70 ns	1,467.580 ns	1,802.320 ns	1,172.11	29.51	2.3193	1.0986	19,799 B
DotNetGlob_IsMatch	1	p?th/a[e-g].txt	712.60 ns	9.156 ns	8.116 ns	9.04	0.11	0.1659	-	1,392 B
CorvusGlob_IsMatch	1	p?th/a[e-g].txt	78.79 ns	0.510 ns	0.426 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [21]	298,107.45 ns	3,382.713 ns	2,998.687 ns	2.88	0.03	2.4414	0.9766	22,643 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [21]	99,287.46 ns	1,722.170 ns	1,438.090 ns	0.96	0.01	0.1221	-	1,904 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [21]	103,657.94 ns	565.973 ns	472.613 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [46]	312,567.33 ns	3,207.218 ns	3,000.034 ns	3.22	0.05	3.4180	1.4648	29,582 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [46]	105,099.14 ns	1,174.698 ns	1,041.339 ns	1.08	0.02	0.2441	-	2,376 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [46]	97,189.91 ns	1,194.456 ns	1,117.295 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/a[e-g].txt	295,026.18 ns	2,900.614 ns	2,713.237 ns	3.01	0.03	1.9531	0.9766	19,784 B
DotNetGlob_IsMatch	10000	p?th/a[e-g].txt	99,972.78 ns	586.234 ns	519.681 ns	1.02	0.01	0.1221	-	1,392 B
CorvusGlob_IsMatch	10000	p?th/a[e-g].txt	97,965.08 ns	580.262 ns	514.387 ns	1.00	0.00	-	-	-

`BaselineRegexCompileAndMatchFalseBenchmarks`

Before (12 byte GlobToken):

Method	NumberOfMatches	Pattern	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Allocated
Compiled_Regex_IsMatch	1	p?th/(...)].txt [21]	94,174.20 ns	1,199.785 ns	1,001.875 ns	948.31	10.52	2.6855	1.3428	22,656 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [21]	941.84 ns	18.107 ns	15.121 ns	9.48	0.13	0.2270	-	1,904 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [21]	99.41 ns	0.889 ns	0.788 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/(...)].txt [46]	108,338.76 ns	996.372 ns	932.007 ns	716.33	4.64	3.4180	1.7090	29,590 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [46]	1,446.65 ns	26.155 ns	23.186 ns	9.54	0.16	0.2823	-	2,376 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [46]	151.44 ns	0.565 ns	0.472 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/a[e-g].txt	89,700.41 ns	1,014.611 ns	899.427 ns	1,130.51	12.58	2.3193	1.0986	19,799 B
DotNetGlob_IsMatch	1	p?th/a[e-g].txt	711.48 ns	8.265 ns	7.327 ns	8.97	0.10	0.1659	-	1,392 B
CorvusGlob_IsMatch	1	p?th/a[e-g].txt	79.33 ns	0.520 ns	0.486 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [21]	295,346.64 ns	2,122.020 ns	1,881.115 ns	3.28	0.03	2.4414	0.9766	22,643 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [21]	98,431.99 ns	238.696 ns	211.598 ns	1.09	0.00	0.1221	-	1,904 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [21]	90,156.42 ns	351.557 ns	311.646 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [46]	313,492.40 ns	2,189.189 ns	2,047.768 ns	3.00	0.03	3.4180	1.4648	29,582 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [46]	103,731.02 ns	286.383 ns	223.589 ns	0.99	0.00	0.2441	-	2,376 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [46]	104,633.11 ns	352.964 ns	294.741 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/a[e-g].txt	293,547.58 ns	2,001.637 ns	1,872.333 ns	3.42	0.03	1.9531	0.9766	19,784 B
DotNetGlob_IsMatch	10000	p?th/a[e-g].txt	100,390.66 ns	727.080 ns	680.112 ns	1.17	0.01	0.1221	-	1,392 B
CorvusGlob_IsMatch	10000	p?th/a[e-g].txt	85,816.24 ns	344.880 ns	287.990 ns	1.00	0.00	-	-	-

After: (8 byte `GlobToken`)

Method	NumberOfMatches	Pattern	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Allocated
Compiled_Regex_IsMatch	1	p?th/(...)].txt [21]	93,831.48 ns	821.521 ns	768.452 ns	958.26	9.38	2.6855	1.3428	22,656 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [21]	957.20 ns	6.269 ns	5.557 ns	9.78	0.07	0.2270	-	1,904 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [21]	97.85 ns	0.423 ns	0.375 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/(...)].txt [46]	107,852.37 ns	686.971 ns	573.652 ns	699.75	5.16	3.4180	1.7090	29,590 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [46]	1,482.14 ns	21.211 ns	19.841 ns	9.61	0.12	0.2823	-	2,376 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [46]	154.20 ns	0.791 ns	0.740 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/a[e-g].txt	90,233.37 ns	1,311.724 ns	1,226.987 ns	1,180.04	19.95	2.3193	1.0986	19,799 B
DotNetGlob_IsMatch	1	p?th/a[e-g].txt	719.24 ns	10.262 ns	9.599 ns	9.41	0.20	0.1659	-	1,392 B
CorvusGlob_IsMatch	1	p?th/a[e-g].txt	76.47 ns	0.957 ns	0.895 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [21]	299,515.26 ns	3,457.838 ns	3,234.464 ns	3.24	0.04	2.4414	0.9766	22,643 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [21]	99,624.34 ns	859.323 ns	803.812 ns	1.08	0.01	0.1221	-	1,904 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [21]	92,458.55 ns	915.452 ns	811.525 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [46]	313,737.50 ns	2,605.794 ns	2,437.461 ns	3.27	0.04	3.4180	1.4648	29,582 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [46]	104,991.37 ns	898.791 ns	840.729 ns	1.09	0.01	0.2441	-	2,376 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [46]	95,917.34 ns	524.277 ns	437.795 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/a[e-g].txt	295,231.08 ns	3,635.725 ns	3,222.976 ns	3.22	0.03	1.9531	0.9766	19,784 B
DotNetGlob_IsMatch	10000	p?th/a[e-g].txt	101,529.01 ns	684.414 ns	606.715 ns	1.11	0.01	0.1221	-	1,392 B
CorvusGlob_IsMatch	10000	p?th/a[e-g].txt	91,612.27 ns	427.833 ns	379.263 ns	1.00	0.00	-	-	-

8-byte `GlobToken` using `checked` in constructor

Method	NumberOfMatches	Pattern	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Allocated
Compiled_Regex_IsMatch	1	p?th/(...)].txt [21]	93,955.38 ns	1,477.181 ns	1,381.756 ns	940.68	14.82	2.6855	1.3428	22,656 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [21]	965.40 ns	12.456 ns	11.042 ns	9.67	0.13	0.2270	-	1,904 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [21]	99.87 ns	0.795 ns	0.705 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/(...)].txt [46]	109,556.31 ns	1,342.473 ns	1,121.025 ns	671.07	7.47	3.4180	1.7090	29,590 B
DotNetGlob_IsMatch	1	p?th/(...)].txt [46]	1,465.73 ns	19.987 ns	17.718 ns	8.96	0.11	0.2823	-	2,376 B
CorvusGlob_IsMatch	1	p?th/(...)].txt [46]	163.29 ns	0.393 ns	0.307 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	1	p?th/a[e-g].txt	91,416.59 ns	1,429.424 ns	1,337.085 ns	1,161.26	19.49	2.3193	1.0986	19,799 B
DotNetGlob_IsMatch	1	p?th/a[e-g].txt	702.86 ns	7.549 ns	6.692 ns	8.91	0.09	0.1659	-	1,392 B
CorvusGlob_IsMatch	1	p?th/a[e-g].txt	78.82 ns	0.458 ns	0.383 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [21]	297,509.87 ns	2,717.840 ns	2,409.295 ns	2.90	0.05	2.4414	0.9766	22,643 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [21]	99,053.46 ns	635.473 ns	594.422 ns	0.97	0.01	0.1221	-	1,904 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [21]	102,601.60 ns	1,431.550 ns	1,339.073 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/(...)].txt [46]	313,212.86 ns	3,045.308 ns	2,699.586 ns	3.24	0.04	3.4180	1.4648	29,582 B
DotNetGlob_IsMatch	10000	p?th/(...)].txt [46]	103,700.68 ns	685.337 ns	641.065 ns	1.07	0.01	0.2441	-	2,376 B
CorvusGlob_IsMatch	10000	p?th/(...)].txt [46]	96,674.95 ns	838.503 ns	743.311 ns	1.00	0.00	-	-	-

Compiled_Regex_IsMatch	10000	p?th/a[e-g].txt	296,002.27 ns	2,809.514 ns	2,490.561 ns	3.02	0.03	1.9531	0.9766	19,784 B
DotNetGlob_IsMatch	10000	p?th/a[e-g].txt	102,155.39 ns	1,268.859 ns	1,604.701 ns	1.04	0.02	0.1221	-	1,392 B
CorvusGlob_IsMatch	10000	p?th/a[e-g].txt	98,313.06 ns	1,139.488 ns	1,065.878 ns	1.00	0.00	-	-	-

mwadams commented 1 year ago

@idg10 Just to bring this alive again - are we happy that the 8-byte glob token is the way to go? And if so, do you want to apply your changes and maybe discuss the issues in a blog post?

cc @Lmooney25

idg10 commented 1 year ago

Having read through my comment and measurements, I'm now definitely uncertain.

My concern is that this adds complication for an extremely small gain. The risks of that complication include:

more likely to contain subtle bugs
harder to maintain
time spent doing this work (as compared with not doing it)

The upside is so small that I'm not convinced it offsets the downside. It wasn't even possible to say for certain that there was an upside until we got to relatively large numbers of repeat applications of the same glob pattern. (It's definitely possible that the additional complexity increases the cost for simpler cases.)

I can't remember the specific use cases for this. Do we have reason to believe that this is featuring in critical loops somewhere?

mwadams commented 1 year ago

I'm looking at what it looks like if we reduce the max string length to 12bits (i.e. 4095 characters) which is fine for URIs as the max usable length is 2,048 characters, along with validation in the GlobToken constructor (which just checks the bits).

This allows us to put a packed value in a uint with 2x12 bits for the start and end, and 8 bits for the token type, along with unchecked unpacking (because we know we are safe from the constructor validation).

A couple of tweaks to the implementation to pull ensure the unpacking of those values occurs at a minimum, gives us measurable performance improvements.

mwadams commented 1 year ago

(I failed to answer this question earlier: I can't remember the specific use cases for this. Do we have reason to believe that this is featuring in critical loops somewhere?)

The most important use case for us is in permissions validation. We use glob matching of URIs to determine Allow/Deny permission in permission sets.

This is then used both in determining access to those resources, and in e.g. stripping linked/embedded resources from HAL documents based on those permissions.

This is another example where every request will do many such checks, and shaving cost out of this operation is another of those "cumulative savings" - especially under load.

corvus-dotnet / Corvus.Globbing