Open stephentoub opened 1 year ago
Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-System.Runtime`, `tenet-performance` |
Milestone: | 8.0.0 |
Some relevant info in #68616. If it has comparable security guarantees and is faster - seems potentially a good idea.
I'm going to guess the 99% case is hashing less than 100 bytes. I wonder whether there are algorithms that perhaps don't perform as well across all input sizes as xxHash but do even better for such short inputs.
I wonder if with this XXH3 we can even remove the fast hashcode that we use in Dictionary till we hit collisitions and fallback to the default randomized one.
Does XxHash3 provide same or better protection against DoS attacks compared to Marvin? Is there a test that can measure it?
I'm going to guess the 99% case is hashing less than 100 bytes. I wonder whether there are algorithms that perhaps don't perform as well across all input sizes as xxHash but do even better for such short inputs.
Most inputs here are in the 6 - 10 char range for strings which represent identifiers. For strings which represent people's names, they can of course be arbitrarily long, but less than 16 chars is a good rule of thumb.
The primary scenario for this is strings which are used within record-like types. Strings which are used as keys in collections are already special cased to go down a faster code path, assuming that optimization remains in place.
Does XxHash3 provide same or better protection against DoS attacks compared to Marvin? Is there a test that can measure it?
No one else I can find uses or evaluates Marvin, so it doesn't show up in any academic / industry comparisons I've seen nor gets much in the way of external scrutiny (that in and of itself might be reason to move on). But XXH3 gets good marks on standard test suites like SMHasher. Levi can likely comment further. We could also adapt such a suite and measure it ourselves if desired.
My suggestion also isn't tied specifically to XXH3... we just have a good implementation of it now, it looks to be well-respected, and its the fastest algorithm we've implemented. If something better came along, we could switch to that instead.
FWIW, I was curious to see how XXH3 compared against FNV-1a, which is what C# uses for hashing strings in switches. I was wondering if it'd make sense for the C# compiler itself to switch to depend on XxHash3 if doing so didn't require an additional assembly reference. But on my machine, the crossover point where XxHash3 starts being faster is at ~13 characters, which I expect is larger than the average string length used in switches, so it's unlikely to be a win. (For very long string lengths, XXH3 is a clear winner; on my machine, at 100 characters, FNV-1a is ~5x slower than XXH3, and at 1000 characters is ~10x slower.)
Putting this into future as we won't be doing it for 8.
We can move it to 9 when and if work actually starts on this to avoid needing to keep pushing it out to the next milestone.
Does XxHash3 provide same or better protection against DoS attacks compared to Marvin? Is there a test that can measure it?
The main point it is likely that Marvin loses to XxHash3 on all criteria, with the latter being optimal choice on collision rate + performance across all input lenghts:
Given this would affect one of the foundational behaviors in .NET, this probably warrants a more extensive research rather than a simple "go vs no go", with non-trivial performance wins on the table across the board with improved dictionary/hashset lookups and code simplification (likely no more need for custom implementations for specific containers).
@stephentoub I'm seeing String.GetNonRandomizedHashCode on a hot path (I guess, Dictionary<string, T>
lookups) in some of out 1P customer's workload - I wonder if we can use XxHash3
there easily since DDoS is not a concern - if we hit collisions we'll switch to the default String.GetHashCode
anyway. I can probably even tell you an average string's length for that customer.
I can probably even tell you an average string's length for that customer.
Yes please
Does XxHash3 provide same or better protection against DoS attacks compared to Marvin? Is there a test that can measure it?
DoS against hashmap data structures comes from key collisions, and while both Marvin and xxHash have good mixing properties and, therefore, a low rate of collision, protection against DoS should come from seeding the hash functions rather than their collision resistance.
That being said, the lack of (at least public) analysis of MarvinHash, and the fact that xxHash has been analyzed (flaws were found and fixed), will by default make xxHash the preferred choice.
I was wondering if it'd make sense for the C# compiler itself to switch to depend on XxHash3
It is worth noting that reducing the xxHash function to a mixer (ulong -> mixed ulong) considerably speeds up the function. Specializing the function rather than using a generic version often provides a very large speedup.
My FastHash project has a benchmark suite for common hash functions with sizes 8, 32, 128, etc.
I've included Marvin, as well as both xxHash v2 and xxHash v3 hash functions. I've made little to no changes to the original algorithms. There is a test suite that tests against published test vectors to ensure correctness.
General notes:
System.IO.Hashing
is rather slow compared to the original xxhash implementation. It also returns a byte array rather than an uint/ulong, which is not great for performance.System.HashCode.Combine()
is based on the 32bit variant of xxHash v2.Benchmark: | Method | Size | Mean | Error | StdDev |
---|---|---|---|---|---|
Xx2Hash32UnsafeTest | 2 | 1.372 ns | 0.0089 ns | 0.0083 ns | |
MarvinHash32Test | 2 | 1.837 ns | 0.0126 ns | 0.0118 ns | |
Xx2Hash64UnsafeTest | 2 | 1.895 ns | 0.0097 ns | 0.0086 ns | |
Xx2Hash64Test | 2 | 2.153 ns | 0.0031 ns | 0.0026 ns | |
Xx2Hash32Test | 2 | 2.329 ns | 0.0110 ns | 0.0103 ns | |
Xx3Hash64UnsafeTest | 2 | 3.460 ns | 0.0165 ns | 0.0155 ns | |
Xx3Hash64Test | 2 | 3.623 ns | 0.0412 ns | 0.0386 ns | |
Xx3Hash128UnsafeTest | 2 | 7.348 ns | 0.0436 ns | 0.0387 ns | |
Xx2Hash32UnsafeTest | 4 | 0.8460 ns | 0.0083 ns | 0.0074 ns | |
Xx2Hash64UnsafeTest | 4 | 1.2169 ns | 0.0107 ns | 0.0095 ns | |
MarvinHash32Test | 4 | 1.6236 ns | 0.0114 ns | 0.0095 ns | |
Xx2Hash32Test | 4 | 1.6752 ns | 0.0136 ns | 0.0113 ns | |
Xx2Hash64Test | 4 | 1.9443 ns | 0.0127 ns | 0.0106 ns | |
Xx3Hash64UnsafeTest | 4 | 4.0610 ns | 0.0153 ns | 0.0143 ns | |
Xx3Hash64Test | 4 | 4.0684 ns | 0.0103 ns | 0.0091 ns | |
Xx3Hash128UnsafeTest | 4 | 8.7722 ns | 0.0126 ns | 0.0112 ns | |
Xx2Hash64UnsafeTest | 8 | 1.062 ns | 0.0140 ns | 0.0131 ns | |
Xx2Hash32UnsafeTest | 8 | 1.357 ns | 0.0163 ns | 0.0145 ns | |
Xx2Hash64Test | 8 | 1.875 ns | 0.0198 ns | 0.0185 ns | |
MarvinHash32Test | 8 | 2.119 ns | 0.0120 ns | 0.0113 ns | |
Xx2Hash32Test | 8 | 2.513 ns | 0.0280 ns | 0.0234 ns | |
Xx3Hash64Test | 8 | 3.835 ns | 0.0257 ns | 0.0241 ns | |
Xx3Hash64UnsafeTest | 8 | 4.039 ns | 0.0135 ns | 0.0113 ns | |
Xx3Hash128UnsafeTest | 8 | 9.136 ns | 0.0630 ns | 0.0590 ns | |
Xx2Hash32UnsafeTest | 16 | 1.627 ns | 0.0058 ns | 0.0052 ns | |
Xx2Hash64UnsafeTest | 16 | 1.993 ns | 0.0069 ns | 0.0061 ns | |
Xx2Hash32Test | 16 | 2.554 ns | 0.0103 ns | 0.0091 ns | |
Xx2Hash64Test | 16 | 2.972 ns | 0.0024 ns | 0.0019 ns | |
Xx3Hash64UnsafeTest | 16 | 3.440 ns | 0.0051 ns | 0.0045 ns | |
MarvinHash32Test | 16 | 3.464 ns | 0.0086 ns | 0.0076 ns | |
Xx3Hash128UnsafeTest | 16 | 9.586 ns | 0.0144 ns | 0.0120 ns | |
Xx3Hash64Test | 16 | 11.475 ns | 0.0185 ns | 0.0173 ns | |
Xx2Hash32UnsafeTest | 32 | 2.805 ns | 0.0084 ns | 0.0078 ns | |
Xx2Hash32Test | 32 | 3.985 ns | 0.0103 ns | 0.0097 ns | |
Xx2Hash64UnsafeTest | 32 | 4.290 ns | 0.0144 ns | 0.0128 ns | |
Xx3Hash64Test | 32 | 4.545 ns | 0.0126 ns | 0.0105 ns | |
Xx2Hash64Test | 32 | 4.913 ns | 0.0189 ns | 0.0167 ns | |
Xx3Hash64UnsafeTest | 32 | 6.065 ns | 0.0201 ns | 0.0178 ns | |
MarvinHash32Test | 32 | 6.779 ns | 0.0161 ns | 0.0134 ns | |
Xx3Hash128UnsafeTest | 32 | 21.619 ns | 0.0387 ns | 0.0323 ns |
Benchmark notes:
unsafe
in the name use unsafe code for less overhead. It is likely a "safer" version using the Unsafe class and ref/out would perform similar, but without unsafe code.The implementation of xxHash in System.IO.Hashing is rather slow compared to the original xxhash implementation.
Can you elaborate? On what hardware? This is contrary to what we previously saw when measuring optimizations here, e.g. https://github.com/dotnet/runtime/pull/77881#issue-1435656399. cc: @xoofx
It also returns a byte array rather than an uint/ulong, which is not great for performance.
You're looking for the HashToUInt64 method. It doesn't create a byte array; it just returns a ulong. https://learn.microsoft.com/en-us/dotnet/api/system.io.hashing.xxhash3.hashtouint64
You're looking for the HashToUInt64 method. It doesn't create a byte array; it just returns a ulong. https://learn.microsoft.com/en-us/dotnet/api/system.io.hashing.xxhash3.hashtouint64
I tested against the release version of the package. Seems the HashToUint64 was introduced in the pre-release of version 8.0 packages. I'll redo my benchmark against the pre-release version. The byte array allocation overhead certainly will have impacted performance.
I tested against the release version of the package.
Ok. In that case, you also weren't using XxHash3, which along with XxHash128 was added in the 8.0 package. Thanks.
Here are the new benchmarks. These functions are the ones from System.IO.Hashing:
The others are my implementations from Genbox.FastHash.
Method | Size | Mean | Error | StdDev |
---|---|---|---|---|
Xx2Hash32UnsafeTest | 2 | 1.355 ns | 0.0122 ns | 0.0109 ns |
MarvinHash32Test | 2 | 1.813 ns | 0.0337 ns | 0.0298 ns |
Xx2Hash64UnsafeTest | 2 | 1.827 ns | 0.0123 ns | 0.0109 ns |
Xx3Hash64NetTest | 2 | 1.864 ns | 0.0161 ns | 0.0150 ns |
Xx2Hash64Test | 2 | 2.142 ns | 0.0292 ns | 0.0273 ns |
Xx2Hash32Test | 2 | 2.640 ns | 0.0152 ns | 0.0142 ns |
Xx3Hash64Test | 2 | 3.458 ns | 0.0297 ns | 0.0264 ns |
Xx3Hash64UnsafeTest | 2 | 3.548 ns | 0.0396 ns | 0.0370 ns |
Xx2Hash64NetTest | 2 | 3.595 ns | 0.0164 ns | 0.0146 ns |
Xx2Hash32NetTest | 2 | 4.098 ns | 0.0267 ns | 0.0236 ns |
Xx2Hash32UnsafeTest | 4 | 1.167 ns | 0.0142 ns | 0.0126 ns |
Xx2Hash64UnsafeTest | 4 | 1.254 ns | 0.0094 ns | 0.0152 ns |
MarvinHash32Test | 4 | 1.625 ns | 0.0088 ns | 0.0083 ns |
Xx2Hash32Test | 4 | 1.662 ns | 0.0123 ns | 0.0115 ns |
Xx2Hash64Test | 4 | 1.748 ns | 0.0143 ns | 0.0133 ns |
Xx3Hash64NetTest | 4 | 1.894 ns | 0.0082 ns | 0.0076 ns |
Xx2Hash32NetTest | 4 | 3.015 ns | 0.0087 ns | 0.0082 ns |
Xx2Hash64NetTest | 4 | 3.295 ns | 0.0161 ns | 0.0150 ns |
Xx3Hash64UnsafeTest | 4 | 3.839 ns | 0.0119 ns | 0.0093 ns |
Xx3Hash64Test | 4 | 4.072 ns | 0.0242 ns | 0.0202 ns |
Xx2Hash32UnsafeTest | 8 | 1.331 ns | 0.0179 ns | 0.0167 ns |
Xx2Hash64UnsafeTest | 8 | 1.342 ns | 0.0094 ns | 0.0088 ns |
Xx2Hash32Test | 8 | 1.677 ns | 0.0145 ns | 0.0128 ns |
Xx3Hash64NetTest | 8 | 1.846 ns | 0.0095 ns | 0.0089 ns |
Xx2Hash64Test | 8 | 1.918 ns | 0.0249 ns | 0.0233 ns |
MarvinHash32Test | 8 | 2.154 ns | 0.0346 ns | 0.0307 ns |
Xx2Hash64NetTest | 8 | 3.183 ns | 0.0310 ns | 0.0259 ns |
Xx2Hash32NetTest | 8 | 3.261 ns | 0.0265 ns | 0.0221 ns |
Xx3Hash64Test | 8 | 3.871 ns | 0.0378 ns | 0.0354 ns |
Xx3Hash64UnsafeTest | 8 | 4.104 ns | 0.0867 ns | 0.0811 ns |
Xx3Hash64NetTest | 16 | 1.539 ns | 0.0095 ns | 0.0089 ns |
Xx2Hash32UnsafeTest | 16 | 1.946 ns | 0.0102 ns | 0.0095 ns |
Xx2Hash64UnsafeTest | 16 | 2.088 ns | 0.0126 ns | 0.0112 ns |
Xx2Hash32Test | 16 | 2.498 ns | 0.0101 ns | 0.0094 ns |
Xx2Hash64Test | 16 | 2.968 ns | 0.0178 ns | 0.0158 ns |
MarvinHash32Test | 16 | 3.415 ns | 0.0219 ns | 0.0205 ns |
Xx3Hash64UnsafeTest | 16 | 3.429 ns | 0.0193 ns | 0.0150 ns |
Xx2Hash32NetTest | 16 | 3.908 ns | 0.0250 ns | 0.0222 ns |
Xx2Hash64NetTest | 16 | 4.220 ns | 0.0443 ns | 0.0392 ns |
Xx3Hash64Test | 16 | 11.696 ns | 0.1103 ns | 0.0978 ns |
Xx3Hash64NetTest | 32 | 2.611 ns | 0.0122 ns | 0.0114 ns |
Xx2Hash32UnsafeTest | 32 | 2.793 ns | 0.0055 ns | 0.0052 ns |
Xx2Hash32Test | 32 | 3.977 ns | 0.0099 ns | 0.0088 ns |
Xx2Hash64UnsafeTest | 32 | 4.259 ns | 0.0107 ns | 0.0095 ns |
Xx3Hash64Test | 32 | 4.591 ns | 0.0086 ns | 0.0076 ns |
Xx2Hash64Test | 32 | 4.891 ns | 0.0256 ns | 0.0227 ns |
Xx2Hash32NetTest | 32 | 6.040 ns | 0.0256 ns | 0.0239 ns |
Xx3Hash64UnsafeTest | 32 | 6.103 ns | 0.0124 ns | 0.0116 ns |
Xx2Hash64NetTest | 32 | 6.486 ns | 0.0147 ns | 0.0123 ns |
MarvinHash32Test | 32 | 6.771 ns | 0.0165 ns | 0.0146 ns |
Config/Hardware:
System.IO.Hashing: 8.0.0-rc.2.23479.6
Genbox.FastHash: 1.0.0-alpha.2
BenchmarkDotNet v0.13.7, Windows 10 (10.0.19045.3570/22H2/2022Update)
12th Gen Intel Core i7-12700K, 1 CPU, 20 logical and 12 physical cores
.NET SDK 8.0.100-rc.2.23502.2
[Host] : .NET 7.0.13 (7.0.1323.51816), X64 RyuJIT AVX2
My previous testing was against System.IO.Hashing 7.0.0, which did not have xxhash v3 or the HashToUInt64/HashToUInt32 functions which are a really nice addition.
xxhash v3 from System.IO.Hashing 8.0.0-rc is indeed fast. It gains traction on 16+ bytes, but has overall good performance.
Good to see Marvin being challenged! It performs well for small input sizes but significantly falls behind some other algorithms such as xxh3 as input size grows.
I've put some of my free time lately into making another non-cryptographic algorithm named gxhash, leveraging SIMD and with high ILP. It's very recent and probably hasn't been adopted by anyone yet, so you may not want to consider it (at least now), still it passes SMHasher (while xxh3 does not) so I believe it is still interesting to have a look.
With the rust implementation the benchmark results are compelling, outperforming all counterparts I have found for all input sizes on my ARM and x64 machines.
As a quick test I have ported the algorithm in full managed C# (using portable SIMD from System.Runtime.Intrinsics
). The implementation is not perfect and could probably be fine-tuned, but already outperforms the proposed Xxh3
.
Another benefit is that it compiles for a much shorter assembly code compared Xxh3
and several other algorithms, but I need to get some numbers for .NET (better inlining opportunities?)
Please let me know if you think it's worth putting a little more effort into this. If you are interested I can clean the C# implementation a bit and share it. Of course any feedback is more than welcome.
Here are some numbers with the current implementation (.NET 8, MacBook M1 pro ARM).
The benchmark code is the same as the one in the first post of this issue + GxHash_32
.
Method | Length | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|---|
Marvin | 0 | 1.369 ns | 0.0361 ns | 0.0056 ns | baseline | |
XXH3 | 0 | 2.326 ns | 0.0677 ns | 0.0105 ns | 1.70x slower | 0.01x |
GxHash_32 | 0 | 1.905 ns | 0.0420 ns | 0.0328 ns | 1.38x slower | 0.01x |
Marvin | 1 | 1.370 ns | 0.0296 ns | 0.0046 ns | baseline | |
XXH3 | 1 | 2.314 ns | 0.0574 ns | 0.0149 ns | 1.69x slower | 0.01x |
GxHash_32 | 1 | 1.904 ns | 0.0370 ns | 0.0096 ns | 1.39x slower | 0.01x |
Marvin | 2 | 1.375 ns | 0.0454 ns | 0.0118 ns | baseline | |
XXH3 | 2 | 2.337 ns | 0.0605 ns | 0.0269 ns | 1.71x slower | 0.03x |
GxHash_32 | 2 | 1.878 ns | 0.0269 ns | 0.0015 ns | 1.37x slower | 0.01x |
Marvin | 4 | 2.399 ns | 0.0710 ns | 0.0593 ns | baseline | |
XXH3 | 4 | 2.343 ns | 0.0345 ns | 0.0053 ns | 1.04x faster | 0.04x |
GxHash_32 | 4 | 1.935 ns | 0.0307 ns | 0.0017 ns | 1.28x faster | 0.05x |
Marvin | 8 | 3.662 ns | 0.0936 ns | 0.0416 ns | baseline | |
XXH3 | 8 | 2.330 ns | 0.0272 ns | 0.0042 ns | 1.58x faster | 0.02x |
GxHash_32 | 8 | 1.928 ns | 0.0236 ns | 0.0037 ns | 1.91x faster | 0.03x |
Marvin | 16 | 5.213 ns | 0.0436 ns | 0.0067 ns | baseline | |
XXH3 | 16 | 2.332 ns | 0.0683 ns | 0.0407 ns | 2.22x faster | 0.06x |
GxHash_32 | 16 | 1.935 ns | 0.0415 ns | 0.0064 ns | 2.69x faster | 0.01x |
Marvin | 32 | 15.791 ns | 0.2854 ns | 0.0156 ns | baseline | |
XXH3 | 32 | 2.339 ns | 0.0514 ns | 0.0183 ns | 6.73x faster | 0.08x |
GxHash_32 | 32 | 1.932 ns | 0.0426 ns | 0.0066 ns | 8.18x faster | 0.04x |
Marvin | 128 | 74.875 ns | 1.2884 ns | 0.0706 ns | baseline | |
XXH3 | 128 | 13.743 ns | 0.2111 ns | 0.0327 ns | 5.44x faster | 0.01x |
GxHash_32 | 128 | 9.366 ns | 0.1968 ns | 0.0108 ns | 7.99x faster | 0.02x |
Marvin | 1000 | 273.099 ns | 3.5320 ns | 0.1936 ns | baseline | |
XXH3 | 1000 | 63.287 ns | 0.4308 ns | 0.0667 ns | 4.32x faster | 0.00x |
GxHash_32 | 1000 | 6.635 ns | 0.0863 ns | 0.0133 ns | 41.14x faster | 0.11x |
Marvin | 10000 | 5,442.992 ns | 46.8482 ns | 2.5679 ns | baseline | |
XXH3 | 10000 | 300.116 ns | 3.5788 ns | 0.1962 ns | 18.14x faster | 0.01x |
GxHash_32 | 10000 | 73.945 ns | 0.8099 ns | 0.0444 ns | 73.61x faster | 0.08x |
@ogxd interesting! Interested in thoughts of others but I imagine we'd want to stick to something that's industry tested for something this fundamental and it sounds like it's relatively early days for your algorithm, although the performance sounds encouraging.
Yes I completely agree, I was just thinking it was worth mentioning it here, despite it being new, given that XxH3 performs worse than Marvin on small input sizes, making the switch to XxH3 more of a compromise than a 100% win.
As some seem to be interested I took some more time on this C# implementation for gxhash
. If more people are interested the source code is here. gxhash
passes all SMHasher quality/collision/avalanche tests (although the PR to have it integrated is still in progress) and uses rounds of AES block cipher internally, so it should be pretty robust, but I let you be the judges, you have the source now.
Some benchmark results from a github runner:
| Method | Value | Mean | Error | StdDev | Ratio | Thoughput (MiB/s) |
|--------- |--------------------- |-------------:|----------:|----------:|------:|-------------------:|
| Marvin | zk | 3.405 ns | 0.0041 ns | 0.0037 ns | 1.00 | 1120.19 ± 2.72 |
| XxH3 | zk | 3.756 ns | 0.0058 ns | 0.0048 ns | 1.10 | 1015.52 ± 3.12 |
| GxHash32 | zk | 1.592 ns | 0.0130 ns | 0.0108 ns | 0.47 | 2396.34 ± 39.01 |
| | | | | | | |
| Marvin | 9Iza | 4.345 ns | 0.0113 ns | 0.0106 ns | 1.00 | 1756.08 ± 9.15 |
| XxH3 | 9Iza | 3.765 ns | 0.0132 ns | 0.0117 ns | 0.87 | 2026.15 ± 14.19 |
| GxHash32 | 9Iza | 1.573 ns | 0.0149 ns | 0.0140 ns | 0.36 | 4849.14 ± 92.00 |
| | | | | | | |
| Marvin | kwqYIEC7 | 6.418 ns | 0.0325 ns | 0.0304 ns | 1.00 | 2377.46 ± 24.07 |
| XxH3 | kwqYIEC7 | 4.048 ns | 0.0077 ns | 0.0068 ns | 0.63 | 3769.81 ± 14.32 |
| GxHash32 | kwqYIEC7 | 1.568 ns | 0.0112 ns | 0.0105 ns | 0.24 | 9732.41 ± 139.11 |
| | | | | | | |
| Marvin | cO0w(...)mE36 [512] | 395.916 ns | 0.1137 ns | 0.1008 ns | 1.00 | 2466.59 ± 1.42 |
| XxH3 | cO0w(...)mE36 [512] | 53.312 ns | 0.2560 ns | 0.2395 ns | 0.13 | 18317.93 ± 175.94 |
| GxHash32 | cO0w(...)mE36 [512] | 15.647 ns | 0.2386 ns | 0.2232 ns | 0.04 | 62410.91 ± 1904.05 |
| | | | | | | |
| Marvin | F2fW(...)I1XD [8192] | 6,330.805 ns | 1.5850 ns | 1.4050 ns | 1.00 | 2468.09 ± 1.24 |
| XxH3 | F2fW(...)I1XD [8192] | 1,017.953 ns | 5.8021 ns | 5.4273 ns | 0.16 | 15349.43 ± 174.98 |
| GxHash32 | F2fW(...)I1XD [8192] | 223.868 ns | 2.1845 ns | 2.0434 ns | 0.04 | 69795.58 ± 1362.24 |
We could, for example, bring XxHash3 down into corelib from System.IO.Hashing, delete Marvin, and use XxHash3 in place of it.
The above is for case-sensitive. Not sure exactly how we'd want to tweak the XXH3 algorithm for case-insensitive, but we could probably do something similar to what we do with Marvin today.
(It's also possible to improve on these numbers a bit for this purpose for XxHash3. There are some calculations in the current implementation that are done on every call, based on the seed supplied to that call, but for string hashing purposes where the seed is constant for the process lifetime, those calculations could be done once for the process when the random seed for the process is selected.)
cc: @GrabYourPitchforks, @jkotas