Closed jamesqo closed 4 years ago
@jnm2, that's one reason this feature is worth building -- so we can go replace substandard hashes across the framework.
Large table of hash functions with performance and quality characteristic: https://github.com/leo-yuriev/t1ha
@arespr I think the team is looking for a C# implementation of the hash functions. Thank you for sharing, though.
@tannergooding Are you still unable to pick this issue back up? If so then I'll post on Reddit/Twitter that we're looking for a hash expert.
edit: Made a post on Reddit. https://www.reddit.com/r/csharp/comments/6qsysm/looking_for_hash_expert_to_help_net_core_team/?ref=share&ref_source=link
@jamesqo, I have a few higher priority things on my plate and won't be able to get to this within the next 3 weeks.
Also, the current measurements will be limited by what we can currently code in C#, however, if/when this becomes a thing (https://github.com/dotnet/designs/issues/13), the measurements will likely change somewhat ;)
Also, the current measurements will be limited by what we can currently code in C#, however, if/when this becomes a thing (dotnet/designs#13), the measurements will likely change somewhat ;)
That's OK-- we can always change the hash algorithm once intrinsics become available, enscapulating/randomizing the hash code enables us to do that. We're just looking for something that offers the best performance/distribution tradeoff for the runtime in its current state.
@jamesqo, thanks for looking for folks to help out. We'd be happy to have somebody who isn't a hash expert work on this too -- we really just need somebody who can port some algorithms to C# from other languages or designs and then do performance measurements. Once we've chosen candidates, our experts will do what we do on any change -- review the code for correctness, performance, security, etc.
Hi! I've just read through the discussion, and at least to me it seems the case is closed strongly in favor of the murmur3-32 PoC. Which BTW seems like a very fine choice to me, and I'd recommend not spending any more needless work (but maybe even drop the .Add()
members...).
But in the unlikely case that somebody wants to continue with more performance work, I could supply some code for xx32, xx64, hsip13/24, seahash, murmur3-x86/32 (and I integrated the marvin32 impl from above), and (yet unoptimized) sip13/24, spookyv2. Some versions of City look easy enough to port, should the need arise. That half-abandoned project had a slightly different use-case in mind, so there is no HashCode class with the proposed API; but for benchmarking it should not matter much.
Definitly not production-ready: the code applies generous amounts of brute-force like copy-pasta, cancerous sprawl of aggressive-inline and unsafe; endianess doesn't exist, neither do unaligned reads. Even tests against ref-impl test-vectors are euphemistically speaking "incomplete".
If this is any help at all, I should find enough time during the next two weeks to fix up the most egregious issues, and make the code and some preliminary results available.
@gimpf
I've just read through the discussion, and at least to me it seems the case is closed strongly in favor of the murmur3-32 PoC. Which BTW seems like a very fine choice to me, and I'd recommend not spending any more needless work
No, people aren't favoring Murmur3 yet. We want to make sure we're picking the absolute best algorithm in terms of balance between performance/distribution, so we can't leave any stone unturned.
But in the unlikely case that somebody wants to continue with more performance work, I could supply some code for xx32, xx64, hsip13/24, seahash, murmur3-x86/32 (and I integrated the marvin32 impl from above), and (yet unoptimized) sip13/24, spookyv2. Some versions of City look easy enough to port, should the need arise.
Yes, please! We want to gather code for as many algorithms as possible to test against. Every new algorithm you can contribute is valuable. It would be highly appreciated if you could port the City algorithms, too.
Definitly not production-ready: the code applies generous amounts of brute-force like copy-pasta, cancerous sprawl of aggressive-inline and unsafe; endianess doesn't exist, neither do unaligned reads. Even tests against ref-impl test-vectors are euphemistically speaking "incomplete".
That's OK. Just bring the code in, and someone else can find it if the need arises.
If this is any help at all, I should find enough time during the next two weeks to fix up the most egregious issues, and make the code and some preliminary results available.
Yes, that would be great!
@jamesqo Ok, I'll drop a note once I've something to show.
@gimpf that sounds really great and we'd love to hear about your progress as you go (no need to wait until you get to work through every algorithm!). Not production-ready is fine as long as you believe the code produces correct results and that the performance is a good representation of what we'd see in a production-ready implementation. Once we pick candidates, we can work with you on getting to high quality implementations.
I haven't seen analysis of how seahash's entropy compares to other algorithms. Do you have any pointers on that? It has interesting sounding perf tradeoffs... vectorization sounds fast, but modular arithmetic sounds slow.
@morganbr I've got a teaser ready.
About SeaHash: No, I've don't know about the quality yet; in case the performance is interesting, I'd add it to SMHasher. At least the author claims it is good (using it for checksums in a filesystem), and also claims that no entropy is thrown away during the mixing.
About the hashes and benchmarks: Project Haschisch.Kastriert, wiki page with first benchmarking results comparing xx32, xx64, hsip13, hsip24, marvin32, sea and murmur3-32.
Some important caveats:
First impressions:
HashSet<>
needs work, as everything is almost within measurement error (I've seen larger differences, but still not worth talking about)I'll write you again once I've improved the situation a bit.
@gimpf, that's a fantastic start! I took a look at the code and results and I have a few questions.
Your HashSet results are particularly interesting. If they hold up, that's a possible case for preferring better entropy over faster hash time.
@morganbr This weekend was more on-and-off, so progress is limited.
About your questions:
- Your results show SimpleMultiplyAdd as about 5x slower than @tannergooding's Murmur3a. That seems odd ...
I was wondering myself. That was a copy/paste error, SimpleMultiplyAdd was always combining four values... Also, by reordering some statements the multiply-add combiner got slightly faster (~60% higher throughput).
Is it possible that your implementations have a common inefficiency that isn't in that Murmur implementation or should I read this as custom implementations having a big advantage over general-purpose ones?
I likely miss some things, but it seems that for .NET general-purpose implementations are not usable for this use-case. I've written Combine-style methods for all algorithms, and w.r.t. hash-code combining most perform much better than the general purpose ones.
However, even those implementations remain too slow; further work is needed. .NET performance in this area is absolutely opaque to me; adding or removing a copy of a local variable can easily change performance by a factor of two. I will likely not be able to provide implementations that are sufficently well-optimized for the purpose of selecting the best option.
- Having results for 1, 2, and 4 combinations is good, but this API goes up to 8.
I've extended the combine-benchmarks. No surprises on that front.
- I saw that you ran on X64 (...), Is it easy for you to also get X86 results?
It once was, but then I ported to .NET Standard. Now I'm in dependency-hell, and only .NET Core 2 and CLR 64bit benchmarks work. This can be resolved easily enough once I resolved the current issues.
Do you think this will make it in v2.1 release?
@gimpf You haven't posted in a while-- do you have a progress update on your implementations? :smiley:
@jamesqo I've fixed some benchmark which caused weird results, and added City32, SpookyV2, Sip13 and Sip24 to the list of available algorithms. The Sips are as fast as expected (relative to the throughput of xx64), City and Spooky are not (same is still true for SeaHash).
For combining hash-codes, Murmur3-32 still looks like a good bet, but I've have yet to run a more exhaustive comparison.
On another note, the streaming API (.Add()) has the unfortunate side effect of removing some hash algorithms from the list of candidates. Given that the performance of such an API is also questionable, you might want to rethink whether to offer it from the beginning.
If the .Add()
part would be avoided, and given that the hash-combiner is using a seed, I don't think that there would be any harm in cleaning up tg's combiner, creating a small test-suite, and call it a day. Since I'm only having a few hours every weekend, and the performance-optimization is somewhat tedious, making the gold-plated version could drag on a bit...
@gimpf , that sounds like great progress. Do you have a results table handy so we can see if there's enough to make a decision and move forward?
@morganbr I've updated my benchmarking results.
For now I've only 64bit results on .NET Core 2. For that platform, City64 w/o seed is the fastest across all sizes. Incorporating a seed, XX-32 is tied with Murmur-3-32. Luckily enough these are the same algorithms that have the reputation to be fast for 32bit platforms, but obviously we need to verify that that holds true for my implementation as well. The results seem to be representative of real-world performance, except that Sea and SpookyV2 seem unusually slow.
You will need to consider how much you really need hash-dos protection for hash-code-combiners. If seeding is only needed to make the hash obviously unusable for persistence, city64 once XOR'd with a 32bit seed would be an improvement. As this utility is only there for combining hashes (and not replace for instance the hash-code for strings, or be a drop-in hasher for integer arrays etc.), that might be good enough.
If OTOH you think you need it, you'll be happy to see that Sip13 is usually less then 50% slower than XX-32 (on 64bit platforms), but that result will likely be significantly different for 32bit apps.
Don't know how much it's relevant to corefx, but I've added LegacyJit 32bit (w/FW 4.7) results.
I'd like to say that the results are ludicrously slow. However, as an example, at 56 MiB/s vs. 319 MiB/s I'm not laughing (that's Sip, it's missing the rotate-left optimization the most). I think I remember why I canceled my .NET hash-algorithm project in January...
So, RyuJit-32bit is still missing, and will (hopefully) give very different results, but for LegacyJit-x86, Murmur-3-32 wins handily, and only City-32 and xx-32 can come close. Murmur still has bad performance at only around 0.4 to 1.1 GB/s instead of 0.6 to 2 GB/s (on the same machine), but at least it's in the right ballpark.
I'm going to run the benchmarks on a few of my boxes tonight and post results (Ryzen, i7, Xeon, A10, i7 Mobile, and I think a couple others).
@tannergooding @morganbr Some nice and some important updates.
Important first:
Nice things:
To run a suite on all prime implementations for combining hash-codes, including "Empty" (pure overhead) and "multiply-add" (speed-optimized version of famous SO answer):
bin\Release\net47\Haschisch.Benchmarks.Net47.exe -j:clr_x86 -j:clr_x64_legacy -j:clr_x64 -j:core_x64 -- CombineHashCode --allcategories=prime
(_Running 32bit Core benchmarks conveniently seems to require prerelease BenchmarkDotNet (or maybe a 32bit-only set-up plus using the Core based bench-runner). It should then work using -j:corex86, hopefully)
Results: After all bugfixing, xx32 seems to win for all overloads w/64 bit RyuJIT, on Windows 10 on a mobile Haswell i7, in a "quick" run. Between the Sips and marvin32, Sip-1-3 always wins. Sip-1-3 is about 4 times slower than xx32, which again is about 2 times slower than a primitive multiply-add combiner. 32bit Core results are still missing, but I am more or less waiting for a stable BenchmarkDotNet release that will solve that issue for me.
(Edit) I just added a quick-run of a benchmark for accessing a hash-set. This is obviously much more dependent on details than the µ-benchmarks above, but you might want to give it a look.
Thanks once again @gimpf for the fantastic data! Let's see if we can turn that into a decision.
To start with, I'd divide the algorithms like this: Fast+Good entropy (ordered by speed):
HashDoS resistant:
Out of contention (slow):
Out of contention (bad entropy):
Before we pick a winner, I'd like to make sure other folks agree with my bucketing above. If it holds, I'd think we just need to choose whether to pay 2x for HashDoS resistance and then go by speed.
@morganbr Your grouping seems fine. As a data-point in SipHash rounds, the Rust project asked Jean-Philippe Aumasson, who authored sip-hash w/DJB. After that discussion they decided to go for sip-1-3 for hash-tables.
(See PR rust:#33940 and the accompanying issue rust:#29754).
Based on the data and comments, I'd like to propose that we use xxHash32 on all architectures. The next step is to get it implemented. @gimpf, are you interested in putting together a PR for that?
For those concerned about HashDoS, I will follow up soon with a proposal for a general-purpose hashing API that should include Marvin32 and may include SipHash. That will also be an appropriate place for the other implementations @gimpf and @tannergooding have worked on.
@morganbr I can put together a PR as time permits. Also, I personally would prefer xx32 too, as long as it doesn't reduce acceptance.
@gimpf , how's your time looking? If you don't really have time, we can also see if anyone else would like to give it a shot.
@morganbr I'd planned to do it until 5th of November, and it still looks good that I'll find the time in the next two weeks.
@gimpf , sounds great. Thanks for the update!
@terrajobst - I'm a bit late to the party (sorry), but can't we change the return type of the Add method?
public HashCode Add<T>(T value);
public HashCode Add<T>(T value, IEqualityComparer<T> comparer);
The params code is clearly there for scenarios where you have multiple fields, e.g.
public override int GetHashCode() => new HashCode().Add(Name, Surname).ToHashCode();
However, exactly the same thing can be achieved like this, albeit with one less wasteful array allocation:
public override int GetHashCode() => new HashCode().Add(Name).Add(Surname).Add(Age).ToHashCode();
Note that types can also be mixed. This could obviously be done by not calling it fluently inside of a regular method. Given this argument that the fluent interface is not absolutely necessary, why does the wasteful params
overload exist to begin with? If this suggestion is a bad suggestion, then the params
overload falls to the very same axe. That, and forcing a regular method for a trivial yet optimal hashcode seems like a lot of ceremony.
Edit: An implicit operator int
would also be nice for DRY, but not exactly crucial.
@jcdickinson
can't we change the return type of the Add method?
We already discussed that in the old proposal, and it was rejected.
why does the wasteful params overload exist to begin with?
We are not adding any params overloads? Do a Ctrl+F for "params" on this webpage, and you'll see that your comment is the only place where that word pops up.
An implicit operator int would also be nice for DRY, but not exactly crucial.
I believe that was also discussed somewhere above...
@jamesqo thanks for the explanation.
params overloads
I meant AddRange
, but I guess I there won't be any traction on this.
@jcdickinson AddRange
was in the original proposal, but it's not in the current version. It was rejected by API review (see https://github.com/dotnet/corefx/issues/14354#issuecomment-308190321 by @terrajobst):
We should remove all
AddRange
methods because the scenario is unclear. Arrays are somewhat unlikely to show up very often. And once larger arrays are involved, the question is whether the computation should be cached. Seeing the for loop on the calling side makes it clear that you need to think about that.
@gimpf I went ahead an polyfilled the proposal with xxHash32. Feel free to grab that implementation. It has tests against actual xxHash32 vectors.
Regarding the interface. I am fully aware that I am making a mountain out of a molehill - feel free to ignore. I am using the current proposal against real stuff and it is a lot of annoying repetition.
I've been playing around with the interface and now understand why the fluent interface was rejected; it's significantly slower.
BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 2 (10.0.15063)
Processor=Intel Core i7-4800MQ CPU 2.70GHz (Haswell), ProcessorCount=8
Frequency=2630626 Hz, Resolution=380.1377 ns, Timer=TSC
.NET Core SDK=2.0.2
[Host] : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT
DefaultJob : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT
Using a non-inlined method as a hash code source; 50 invocations of Add vs a fluent extension method:
Method | Mean | Error | StdDev | Scaled |
---|---|---|---|---|
Add | 401.6 ns | 1.262 ns | 1.180 ns | 1.00 |
Tally | 747.8 ns | 2.329 ns | 2.178 ns | 1.86 |
However, the following pattern does work:
public struct HashCode : System.Collections.IEnumerable
{
[EditorBrowsable(EditorBrowsableState.Never)]
[Obsolete("This method is provided for collection initializer syntax.", error: true)]
public IEnumerator GetEnumerator() => throw new NotImplementedException();
}
public override int GetHashCode() => new HashCode()
{
Age, // int
{ Name, StringComparer.Ordinal }, // use Comparer
Hat // some arbitrary object
}.ToHashCode();
It also has identical performance characteristics to the current proposal:
Method | Mean | Error | StdDev | Scaled |
---|---|---|---|---|
Add | 405.0 ns | 2.130 ns | 1.889 ns | 1.00 |
Initializer | 400.8 ns | 4.821 ns | 4.274 ns | 0.99 |
Sadly it is somewhat a hack, as the IEnumerable
has to be implemented to keep the compiler happy. That being said, the Obsolete
will error out on even foreach
- you'd have to really want to break things in order to run into the exception. The MSIL across the two is essentially identical.
@jcdickinson thanks for grabbing the issue. I sent you Collaborator invite, let me know when you accept and I will be able to assign this issue to you (assigning to myself in the mean time).
Pro-tip: Once you accept, GitHub will automatically sign you up for all notifications from the repo (500+ per day), I would recommend to change it to just "Not Watching" which will send you all your mentions and notifications for issues you subscribed to.
@jcdickinson , I'm definitely interested in ways to avoid annoying repetition (though I have no idea how folks would feel about initializer syntax). I seem to recall that there were two problems with fluent :
var hc = new HashCode();
var newHc = hc.Add(foo);
hc.Add(bar);
return newHc.ToHashCode();
Since the proposal on this thread is already approved (and you're well on your way to getting it merged), I'd suggest starting a new API proposal for any changes.
@karelz I believe @gimpf already grabbed this issue beforehand. Since he has more familiarity with the implementation, please assign this issue to @gimpf instead. (edit: nvm)
@terrajobst One kind of last-minute API request for this. Since we marked GetHashCode
obsolete, we're implicitly telling the user that HashCode
s are not values meant to be compared, despite being structs which are typically immutable/comparable. In that case, should we mark Equals
obsolete too?
[Obsolete("HashCode is a mutable struct and should not be compared with other HashCodes.", error: true)]
[EditorBrowsable(Never)]
// If this is too harsh, base.Equals() is fine as long as the [Obsolete] stays
public override bool Equals(object obj) => throw new NotSupportedException("HashCode is a mutable struct and should not be compared with other HashCodes.");
I think something similar was done with Span
.
If that's accepted, then I think...
should not
, or may not
instead of cannot
in the Obsolete message.@Joe4evr Fine with me; I've updated the comment. It may also be beneficial to include the same message in the GetHashCode
exception too, then:
public override int GetHashCode() => throw new NotSupportedException("HashCode is a mutable struct and should not be compared with other HashCodes.");
@morganbr Why did you reopen this?
The PR to get it exposed in CoreFX has not gone through yet.
@gimpf do you have the code you benchmarked available and/or would you be able to quickly see how the SpookilySharp nuget package fairs. I'm looking to dust that project off after a couple of year's stagnation and I'm curious to see how it stands up.
@JonHanna He posted it here: https://github.com/gimpf/Haschisch.Kastriert
@JonHanna, I'd be interested to hear how your testing goes so we can start thinking about what would be useful in a general-purpose non-cryptographic hashing API.
@morganbr Where would be an appropriate forum to discuss such an API? I expect that such an API would consist of more than just the lowest common denominator, and maybe a good API will also need an improved JIT w.r.t. handling of larger structs. Discussing all that might be better done in a separate issue...
@gimpf Opened one for you. dotnet/corefx#25666
@morganbr - Can we get the package name & version no which will include this commit?
@karelz, might you be able to help @smitpatel with package/version info?
I would try daily build of .NET Core - I'd wait until tomorrow. I don't think there is a package you can simply take dependency on.
Update 6/16/17: Looking for volunteers
The API shape has been finalized. However, we're still deciding on the best hash algorithm out of a list of candidates to use for the implementation, and we need someone to help us measure the throughput/distribution of each algorithm. If you'd like to take that role up, please leave a comment below and @karelz will assign this issue to you.
Update 6/13/17: Proposal accepted!
Here's the API that was approved by @terrajobst at https://github.com/dotnet/corefx/issues/14354#issuecomment-308190321:
The original text of this proposal follows.
Rationale
Generating a good hash code should not require use of ugly magic constants and bit twiddling on our code. It should be less tempting to write a bad-but-concise
GetHashCode
implementation such asProposal
We should add a
HashCode
type to enscapulate hash code creation and avoid forcing devs to get mixed up in the messy details. Here is my proposal, which is based off of https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329, with a few minor revisions.Remarks
See @terrajobst's comment at https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329 for the goals of this API; all of his remarks are valid. I would like to point out these ones in particular, however: