jamesqo commented 7 years ago

Update 6/16/17: Looking for volunteers

The API shape has been finalized. However, we're still deciding on the best hash algorithm out of a list of candidates to use for the implementation, and we need someone to help us measure the throughput/distribution of each algorithm. If you'd like to take that role up, please leave a comment below and @karelz will assign this issue to you.

Update 6/13/17: Proposal accepted!

Here's the API that was approved by @terrajobst at https://github.com/dotnet/corefx/issues/14354#issuecomment-308190321:

// Will live in the core assembly
// .NET Framework : mscorlib
// .NET Core      : System.Runtime / System.Private.CoreLib
namespace System
{
    public struct HashCode
    {
        public static int Combine<T1>(T1 value1);
        public static int Combine<T1, T2>(T1 value1, T2 value2);
        public static int Combine<T1, T2, T3>(T1 value1, T2 value2, T3 value3);
        public static int Combine<T1, T2, T3, T4>(T1 value1, T2 value2, T3 value3, T4 value4);
        public static int Combine<T1, T2, T3, T4, T5>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5);
        public static int Combine<T1, T2, T3, T4, T5, T6>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7, T8>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7, T8 value8);

        public void Add<T>(T value);
        public void Add<T>(T value, IEqualityComparer<T> comparer);

        [Obsolete("Use ToHashCode to retrieve the computed hash code.", error: true)]
        [EditorBrowsable(Never)]
        public override int GetHashCode();

        public int ToHashCode();
    }
}

The original text of this proposal follows.

Rationale

Generating a good hash code should not require use of ugly magic constants and bit twiddling on our code. It should be less tempting to write a bad-but-concise GetHashCode implementation such as

class Person
{
    public override int GetHashCode() => FirstName.GetHashCode() + LastName.GetHashCode();
}

Proposal

We should add a HashCode type to enscapulate hash code creation and avoid forcing devs to get mixed up in the messy details. Here is my proposal, which is based off of https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329, with a few minor revisions.

// Will live in the core assembly
// .NET Framework : mscorlib
// .NET Core      : System.Runtime / System.Private.CoreLib
namespace System
{
    public struct HashCode
    {
        public static int Combine<T1>(T1 value1);
        public static int Combine<T1, T2>(T1 value1, T2 value2);
        public static int Combine<T1, T2, T3>(T1 value1, T2 value2, T3 value3);
        public static int Combine<T1, T2, T3, T4>(T1 value1, T2 value2, T3 value3, T4 value4);
        public static int Combine<T1, T2, T3, T4, T5>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5);
        public static int Combine<T1, T2, T3, T4, T5, T6>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7);
        public static int Combine<T1, T2, T3, T4, T5, T6, T7, T8>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7, T8 value8);

        public void Add<T>(T value);
        public void Add<T>(T value, IEqualityComparer<T> comparer);
        public void AddRange<T>(T[] values);
        public void AddRange<T>(T[] values, int index, int count);
        public void AddRange<T>(T[] values, int index, int count, IEqualityComparer<T> comparer);

        [Obsolete("Use ToHashCode to retrieve the computed hash code.", error: true)]
        public override int GetHashCode();

        public int ToHashCode();
    }
}

Remarks

See @terrajobst's comment at https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329 for the goals of this API; all of his remarks are valid. I would like to point out these ones in particular, however:

The API does not need to produce a strong cryptographic hash
The API will provide "a" hash code, but not guarantee a particular hash code algorithm. This allows us to use a different algorithm later or use different algorithms on different architectures.
The API will guarantee that within a given process the same values will yield the same hash code. Different instances of the same app will likely produce different hash codes due to randomization. This allows us to ensure that consumers cannot persist hash values and accidentally rely on them being stable across runs (or worse, versions of the platform).

morganbr commented 7 years ago

@jnm2, that's one reason this feature is worth building -- so we can go replace substandard hashes across the framework.

mariuszkochanowski commented 7 years ago

Large table of hash functions with performance and quality characteristic: https://github.com/leo-yuriev/t1ha

jamesqo commented 7 years ago

@arespr I think the team is looking for a C# implementation of the hash functions. Thank you for sharing, though.

@tannergooding Are you still unable to pick this issue back up? If so then I'll post on Reddit/Twitter that we're looking for a hash expert.

edit: Made a post on Reddit. https://www.reddit.com/r/csharp/comments/6qsysm/looking_for_hash_expert_to_help_net_core_team/?ref=share&ref_source=link

tannergooding commented 7 years ago

@jamesqo, I have a few higher priority things on my plate and won't be able to get to this within the next 3 weeks.

tannergooding commented 7 years ago

Also, the current measurements will be limited by what we can currently code in C#, however, if/when this becomes a thing (https://github.com/dotnet/designs/issues/13), the measurements will likely change somewhat ;)

jamesqo commented 7 years ago

Also, the current measurements will be limited by what we can currently code in C#, however, if/when this becomes a thing (dotnet/designs#13), the measurements will likely change somewhat ;)

That's OK-- we can always change the hash algorithm once intrinsics become available, enscapulating/randomizing the hash code enables us to do that. We're just looking for something that offers the best performance/distribution tradeoff for the runtime in its current state.

morganbr commented 7 years ago

@jamesqo, thanks for looking for folks to help out. We'd be happy to have somebody who isn't a hash expert work on this too -- we really just need somebody who can port some algorithms to C# from other languages or designs and then do performance measurements. Once we've chosen candidates, our experts will do what we do on any change -- review the code for correctness, performance, security, etc.

gimpf commented 7 years ago

Hi! I've just read through the discussion, and at least to me it seems the case is closed strongly in favor of the murmur3-32 PoC. Which BTW seems like a very fine choice to me, and I'd recommend not spending any more needless work (but maybe even drop the .Add() members...).

But in the unlikely case that somebody wants to continue with more performance work, I could supply some code for xx32, xx64, hsip13/24, seahash, murmur3-x86/32 (and I integrated the marvin32 impl from above), and (yet unoptimized) sip13/24, spookyv2. Some versions of City look easy enough to port, should the need arise. That half-abandoned project had a slightly different use-case in mind, so there is no HashCode class with the proposed API; but for benchmarking it should not matter much.

Definitly not production-ready: the code applies generous amounts of brute-force like copy-pasta, cancerous sprawl of aggressive-inline and unsafe; endianess doesn't exist, neither do unaligned reads. Even tests against ref-impl test-vectors are euphemistically speaking "incomplete".

If this is any help at all, I should find enough time during the next two weeks to fix up the most egregious issues, and make the code and some preliminary results available.

jamesqo commented 7 years ago

@gimpf

I've just read through the discussion, and at least to me it seems the case is closed strongly in favor of the murmur3-32 PoC. Which BTW seems like a very fine choice to me, and I'd recommend not spending any more needless work

No, people aren't favoring Murmur3 yet. We want to make sure we're picking the absolute best algorithm in terms of balance between performance/distribution, so we can't leave any stone unturned.

But in the unlikely case that somebody wants to continue with more performance work, I could supply some code for xx32, xx64, hsip13/24, seahash, murmur3-x86/32 (and I integrated the marvin32 impl from above), and (yet unoptimized) sip13/24, spookyv2. Some versions of City look easy enough to port, should the need arise.

Yes, please! We want to gather code for as many algorithms as possible to test against. Every new algorithm you can contribute is valuable. It would be highly appreciated if you could port the City algorithms, too.

Definitly not production-ready: the code applies generous amounts of brute-force like copy-pasta, cancerous sprawl of aggressive-inline and unsafe; endianess doesn't exist, neither do unaligned reads. Even tests against ref-impl test-vectors are euphemistically speaking "incomplete".

That's OK. Just bring the code in, and someone else can find it if the need arises.

If this is any help at all, I should find enough time during the next two weeks to fix up the most egregious issues, and make the code and some preliminary results available.

Yes, that would be great!

gimpf commented 7 years ago

@jamesqo Ok, I'll drop a note once I've something to show.

morganbr commented 7 years ago

@gimpf that sounds really great and we'd love to hear about your progress as you go (no need to wait until you get to work through every algorithm!). Not production-ready is fine as long as you believe the code produces correct results and that the performance is a good representation of what we'd see in a production-ready implementation. Once we pick candidates, we can work with you on getting to high quality implementations.

I haven't seen analysis of how seahash's entropy compares to other algorithms. Do you have any pointers on that? It has interesting sounding perf tradeoffs... vectorization sounds fast, but modular arithmetic sounds slow.

gimpf commented 7 years ago

@morganbr I've got a teaser ready.

About SeaHash: No, I've don't know about the quality yet; in case the performance is interesting, I'd add it to SMHasher. At least the author claims it is good (using it for checksums in a filesystem), and also claims that no entropy is thrown away during the mixing.

About the hashes and benchmarks: Project Haschisch.Kastriert, wiki page with first benchmarking results comparing xx32, xx64, hsip13, hsip24, marvin32, sea and murmur3-32.

Some important caveats:

This was a very quick bench run with low accuracy settings.
The implementations are not really done yet, and some contenders are still missing. The Streaming implementations (such a thing would become necessary for a sensible .Add() support) are in need of actual optimization.
SeaHash is currently not using a seed.

First impressions:

for large messages, xx64 is the fastest of the listed implementations (around 3.25 bytes per cycle, as far as I understand, or 9.5 GiB/s on my notebook)
for short messages, nothing is great, but murmur3-32, and (surprisingly) seahash have an edge, but the latter is likely explained by seahash not yet using a seed.
the "benchmark" for accessing a HashSet<> needs work, as everything is almost within measurement error (I've seen larger differences, but still not worth talking about)
when combining hash-codes, the murmur-3A PoC is around 5 to 20 times faster then what we have here
some abstractions in C# are very expensive; that makes comparing hash-algorithms more annoying than necessary.

I'll write you again once I've improved the situation a bit.

morganbr commented 7 years ago

@gimpf, that's a fantastic start! I took a look at the code and results and I have a few questions.

Your results show SimpleMultiplyAdd as about 5x slower than @tannergooding's Murmur3a. That seems odd since Murmur has more work to do than multiply+add (though I'll concede that rotate is a faster operation than add). Is it possible that your implementations have a common inefficiency that isn't in that Murmur implementation or should I read this as custom implementations having a big advantage over general-purpose ones?
Having results for 1, 2, and 4 combinations is good, but this API goes up to 8. Would it be possible to get results for that as well or does that cause too much duplication?
I saw that you ran on X64, so these results should help us along on choosing our X64 algorithm, but other benchmarks suggest that algorithms can differ pretty dramatically between X86 and X64. Is it easy for you to also get X86 results? (At some point, we'd also need to get ARM and ARM64, but those can definitely wait)

Your HashSet results are particularly interesting. If they hold up, that's a possible case for preferring better entropy over faster hash time.

gimpf commented 7 years ago

@morganbr This weekend was more on-and-off, so progress is limited.

About your questions:

Your results show SimpleMultiplyAdd as about 5x slower than @tannergooding's Murmur3a. That seems odd ...

I was wondering myself. That was a copy/paste error, SimpleMultiplyAdd was always combining four values... Also, by reordering some statements the multiply-add combiner got slightly faster (~60% higher throughput).

Is it possible that your implementations have a common inefficiency that isn't in that Murmur implementation or should I read this as custom implementations having a big advantage over general-purpose ones?

I likely miss some things, but it seems that for .NET general-purpose implementations are not usable for this use-case. I've written Combine-style methods for all algorithms, and w.r.t. hash-code combining most perform much better than the general purpose ones.

However, even those implementations remain too slow; further work is needed. .NET performance in this area is absolutely opaque to me; adding or removing a copy of a local variable can easily change performance by a factor of two. I will likely not be able to provide implementations that are sufficently well-optimized for the purpose of selecting the best option.

Having results for 1, 2, and 4 combinations is good, but this API goes up to 8.

I've extended the combine-benchmarks. No surprises on that front.

I saw that you ran on X64 (...), Is it easy for you to also get X86 results?

It once was, but then I ported to .NET Standard. Now I'm in dependency-hell, and only .NET Core 2 and CLR 64bit benchmarks work. This can be resolved easily enough once I resolved the current issues.

shaggygi commented 7 years ago

Do you think this will make it in v2.1 release?

jamesqo commented 7 years ago

@gimpf You haven't posted in a while-- do you have a progress update on your implementations? :smiley:

gimpf commented 7 years ago

@jamesqo I've fixed some benchmark which caused weird results, and added City32, SpookyV2, Sip13 and Sip24 to the list of available algorithms. The Sips are as fast as expected (relative to the throughput of xx64), City and Spooky are not (same is still true for SeaHash).

For combining hash-codes, Murmur3-32 still looks like a good bet, but I've have yet to run a more exhaustive comparison.

On another note, the streaming API (.Add()) has the unfortunate side effect of removing some hash algorithms from the list of candidates. Given that the performance of such an API is also questionable, you might want to rethink whether to offer it from the beginning.

If the .Add() part would be avoided, and given that the hash-combiner is using a seed, I don't think that there would be any harm in cleaning up tg's combiner, creating a small test-suite, and call it a day. Since I'm only having a few hours every weekend, and the performance-optimization is somewhat tedious, making the gold-plated version could drag on a bit...

morganbr commented 7 years ago

@gimpf , that sounds like great progress. Do you have a results table handy so we can see if there's enough to make a decision and move forward?

gimpf commented 7 years ago

@morganbr I've updated my benchmarking results.

For now I've only 64bit results on .NET Core 2. For that platform, City64 w/o seed is the fastest across all sizes. Incorporating a seed, XX-32 is tied with Murmur-3-32. Luckily enough these are the same algorithms that have the reputation to be fast for 32bit platforms, but obviously we need to verify that that holds true for my implementation as well. The results seem to be representative of real-world performance, except that Sea and SpookyV2 seem unusually slow.

You will need to consider how much you really need hash-dos protection for hash-code-combiners. If seeding is only needed to make the hash obviously unusable for persistence, city64 once XOR'd with a 32bit seed would be an improvement. As this utility is only there for combining hashes (and not replace for instance the hash-code for strings, or be a drop-in hasher for integer arrays etc.), that might be good enough.

If OTOH you think you need it, you'll be happy to see that Sip13 is usually less then 50% slower than XX-32 (on 64bit platforms), but that result will likely be significantly different for 32bit apps.

gimpf commented 7 years ago

Don't know how much it's relevant to corefx, but I've added LegacyJit 32bit (w/FW 4.7) results.

I'd like to say that the results are ludicrously slow. However, as an example, at 56 MiB/s vs. 319 MiB/s I'm not laughing (that's Sip, it's missing the rotate-left optimization the most). I think I remember why I canceled my .NET hash-algorithm project in January...

So, RyuJit-32bit is still missing, and will (hopefully) give very different results, but for LegacyJit-x86, Murmur-3-32 wins handily, and only City-32 and xx-32 can come close. Murmur still has bad performance at only around 0.4 to 1.1 GB/s instead of 0.6 to 2 GB/s (on the same machine), but at least it's in the right ballpark.

tannergooding commented 7 years ago

I'm going to run the benchmarks on a few of my boxes tonight and post results (Ryzen, i7, Xeon, A10, i7 Mobile, and I think a couple others).

gimpf commented 6 years ago

@tannergooding @morganbr Some nice and some important updates.

Important first:

I fixed some combine-implementations which were producing incorrect hash values.
The benchmark suite now works harder to avoid constant folding. City64 was susceptible (as was murmur-3-32 in the past). Doesn't mean that I understand every result now, but they are much more plausible.

Nice things:

Combiner implementations are now available for all 1 to 8 argument overloads, including the somewhat more cumbersome manually unrolled implementations for xx/city.
Tests and Benchmarks check those too. Since many hash algorithms have special-cased low-byte messages those measurements might be of interest.
Simplified running benchmarks for multiple targets (Core vs. FW).

To run a suite on all prime implementations for combining hash-codes, including "Empty" (pure overhead) and "multiply-add" (speed-optimized version of famous SO answer):

bin\Release\net47\Haschisch.Benchmarks.Net47.exe -j:clr_x86 -j:clr_x64_legacy -j:clr_x64 -j:core_x64 -- CombineHashCode --allcategories=prime

(_Running 32bit Core benchmarks conveniently seems to require prerelease BenchmarkDotNet (or maybe a 32bit-only set-up plus using the Core based bench-runner). It should then work using -j:corex86, hopefully)

Results: After all bugfixing, xx32 seems to win for all overloads w/64 bit RyuJIT, on Windows 10 on a mobile Haswell i7, in a "quick" run. Between the Sips and marvin32, Sip-1-3 always wins. Sip-1-3 is about 4 times slower than xx32, which again is about 2 times slower than a primitive multiply-add combiner. 32bit Core results are still missing, but I am more or less waiting for a stable BenchmarkDotNet release that will solve that issue for me.

(Edit) I just added a quick-run of a benchmark for accessing a hash-set. This is obviously much more dependent on details than the µ-benchmarks above, but you might want to give it a look.

morganbr commented 6 years ago

Thanks once again @gimpf for the fantastic data! Let's see if we can turn that into a decision.

To start with, I'd divide the algorithms like this: Fast+Good entropy (ordered by speed):

xxHash32
City64 (This will probably be slow on x86, so we'll probably have to pick something else for x86)
Murmur3A

HashDoS resistant:

Marvin32
SipHash. If we lean toward this, we'll need to get it reviewed by Microsoft's crypto experts to confirm the research results are acceptable. We'll also have to figure out which parameters are secure enough. The paper suggests somewhere between Sip-2-4 and Sip-4-8.

Out of contention (slow):

SpookyV2
City32
xxHash64 *SeaHash (and we don't have data on entropy)

Out of contention (bad entropy):

MultiplyAdd
HSip

Before we pick a winner, I'd like to make sure other folks agree with my bucketing above. If it holds, I'd think we just need to choose whether to pay 2x for HashDoS resistance and then go by speed.

gimpf commented 6 years ago

@morganbr Your grouping seems fine. As a data-point in SipHash rounds, the Rust project asked Jean-Philippe Aumasson, who authored sip-hash w/DJB. After that discussion they decided to go for sip-1-3 for hash-tables.

(See PR rust:#33940 and the accompanying issue rust:#29754).

morganbr commented 6 years ago

Based on the data and comments, I'd like to propose that we use xxHash32 on all architectures. The next step is to get it implemented. @gimpf, are you interested in putting together a PR for that?

For those concerned about HashDoS, I will follow up soon with a proposal for a general-purpose hashing API that should include Marvin32 and may include SipHash. That will also be an appropriate place for the other implementations @gimpf and @tannergooding have worked on.

gimpf commented 6 years ago

@morganbr I can put together a PR as time permits. Also, I personally would prefer xx32 too, as long as it doesn't reduce acceptance.

morganbr commented 6 years ago

@gimpf , how's your time looking? If you don't really have time, we can also see if anyone else would like to give it a shot.

gimpf commented 6 years ago

@morganbr I'd planned to do it until 5th of November, and it still looks good that I'll find the time in the next two weeks.

morganbr commented 6 years ago

@gimpf , sounds great. Thanks for the update!

jcdickinson commented 6 years ago

@terrajobst - I'm a bit late to the party (sorry), but can't we change the return type of the Add method?

        public HashCode Add<T>(T value);
        public HashCode Add<T>(T value, IEqualityComparer<T> comparer);

The params code is clearly there for scenarios where you have multiple fields, e.g.

        public override int GetHashCode() => new HashCode().Add(Name, Surname).ToHashCode();

However, exactly the same thing can be achieved like this, albeit with one less wasteful array allocation:

        public override int GetHashCode() => new HashCode().Add(Name).Add(Surname).Add(Age).ToHashCode();

Note that types can also be mixed. This could obviously be done by not calling it fluently inside of a regular method. Given this argument that the fluent interface is not absolutely necessary, why does the wasteful params overload exist to begin with? If this suggestion is a bad suggestion, then the params overload falls to the very same axe. That, and forcing a regular method for a trivial yet optimal hashcode seems like a lot of ceremony.

Edit: An implicit operator int would also be nice for DRY, but not exactly crucial.

jamesqo commented 6 years ago

@jcdickinson

can't we change the return type of the Add method?

We already discussed that in the old proposal, and it was rejected.

why does the wasteful params overload exist to begin with?

We are not adding any params overloads? Do a Ctrl+F for "params" on this webpage, and you'll see that your comment is the only place where that word pops up.

An implicit operator int would also be nice for DRY, but not exactly crucial.

I believe that was also discussed somewhere above...

jcdickinson commented 6 years ago

@jamesqo thanks for the explanation.

params overloads

I meant AddRange, but I guess I there won't be any traction on this.

svick commented 6 years ago

@jcdickinson AddRange was in the original proposal, but it's not in the current version. It was rejected by API review (see https://github.com/dotnet/corefx/issues/14354#issuecomment-308190321 by @terrajobst):

We should remove all AddRange methods because the scenario is unclear. Arrays are somewhat unlikely to show up very often. And once larger arrays are involved, the question is whether the computation should be cached. Seeing the for loop on the calling side makes it clear that you need to think about that.

jcdickinson commented 6 years ago

@gimpf I went ahead an polyfilled the proposal with xxHash32. Feel free to grab that implementation. It has tests against actual xxHash32 vectors.

Edit

Regarding the interface. I am fully aware that I am making a mountain out of a molehill - feel free to ignore. I am using the current proposal against real stuff and it is a lot of annoying repetition.

I've been playing around with the interface and now understand why the fluent interface was rejected; it's significantly slower.


BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 2 (10.0.15063)
Processor=Intel Core i7-4800MQ CPU 2.70GHz (Haswell), ProcessorCount=8
Frequency=2630626 Hz, Resolution=380.1377 ns, Timer=TSC
.NET Core SDK=2.0.2
  [Host]     : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT

Using a non-inlined method as a hash code source; 50 invocations of Add vs a fluent extension method:

Method	Mean	Error	StdDev	Scaled
Add	401.6 ns	1.262 ns	1.180 ns	1.00
Tally	747.8 ns	2.329 ns	2.178 ns	1.86

However, the following pattern does work:

    public struct HashCode : System.Collections.IEnumerable
    {
        [EditorBrowsable(EditorBrowsableState.Never)]
        [Obsolete("This method is provided for collection initializer syntax.", error: true)]
        public IEnumerator GetEnumerator() => throw new NotImplementedException();
    }

    public override int GetHashCode() => new HashCode()
    {
        Age, // int
        { Name, StringComparer.Ordinal }, // use Comparer
        Hat // some arbitrary object
    }.ToHashCode();

It also has identical performance characteristics to the current proposal:

Method	Mean	Error	StdDev	Scaled
Add	405.0 ns	2.130 ns	1.889 ns	1.00
Initializer	400.8 ns	4.821 ns	4.274 ns	0.99

Sadly it is somewhat a hack, as the IEnumerable has to be implemented to keep the compiler happy. That being said, the Obsolete will error out on even foreach - you'd have to really want to break things in order to run into the exception. The MSIL across the two is essentially identical.

karelz commented 6 years ago

@jcdickinson thanks for grabbing the issue. I sent you Collaborator invite, let me know when you accept and I will be able to assign this issue to you (assigning to myself in the mean time).

Pro-tip: Once you accept, GitHub will automatically sign you up for all notifications from the repo (500+ per day), I would recommend to change it to just "Not Watching" which will send you all your mentions and notifications for issues you subscribed to.

morganbr commented 6 years ago

@jcdickinson , I'm definitely interested in ways to avoid annoying repetition (though I have no idea how folks would feel about initializer syntax). I seem to recall that there were two problems with fluent :

The perf issue you noted
The return value from fluent methods is a copy of the struct. It's too easy to accidentally end up losing input doing things like:
```
var hc = new HashCode();
var newHc = hc.Add(foo);
hc.Add(bar);
return newHc.ToHashCode();
```

Since the proposal on this thread is already approved (and you're well on your way to getting it merged), I'd suggest starting a new API proposal for any changes.

jamesqo commented 6 years ago

@karelz I believe @gimpf already grabbed this issue beforehand. Since he has more familiarity with the implementation, please assign this issue to @gimpf instead. (edit: nvm)

jamesqo commented 6 years ago

@terrajobst One kind of last-minute API request for this. Since we marked GetHashCode obsolete, we're implicitly telling the user that HashCodes are not values meant to be compared, despite being structs which are typically immutable/comparable. In that case, should we mark Equals obsolete too?

[Obsolete("HashCode is a mutable struct and should not be compared with other HashCodes.", error: true)]
[EditorBrowsable(Never)]
// If this is too harsh, base.Equals() is fine as long as the [Obsolete] stays
public override bool Equals(object obj) => throw new NotSupportedException("HashCode is a mutable struct and should not be compared with other HashCodes.");

I think something similar was done with Span.

Joe4evr commented 6 years ago

If that's accepted, then I think...

I'd consider using should not, or may not instead of cannot in the Obsolete message.
Provided the Exception stays, I would put the same string in its message, just in case the method gets called through a cast or open generic.

jamesqo commented 6 years ago

@Joe4evr Fine with me; I've updated the comment. It may also be beneficial to include the same message in the GetHashCode exception too, then:

public override int GetHashCode() => throw new NotSupportedException("HashCode is a mutable struct and should not be compared with other HashCodes.");

jamesqo commented 6 years ago

@morganbr Why did you reopen this?

jkotas commented 6 years ago

The PR to get it exposed in CoreFX has not gone through yet.

JonHanna commented 6 years ago

@gimpf do you have the code you benchmarked available and/or would you be able to quickly see how the SpookilySharp nuget package fairs. I'm looking to dust that project off after a couple of year's stagnation and I'm curious to see how it stands up.

jamesqo commented 6 years ago

@JonHanna He posted it here: https://github.com/gimpf/Haschisch.Kastriert

morganbr commented 6 years ago

@JonHanna, I'd be interested to hear how your testing goes so we can start thinking about what would be useful in a general-purpose non-cryptographic hashing API.

gimpf commented 6 years ago

@morganbr Where would be an appropriate forum to discuss such an API? I expect that such an API would consist of more than just the lowest common denominator, and maybe a good API will also need an improved JIT w.r.t. handling of larger structs. Discussing all that might be better done in a separate issue...

jamesqo commented 6 years ago

@gimpf Opened one for you. dotnet/corefx#25666

smitpatel commented 6 years ago

@morganbr - Can we get the package name & version no which will include this commit?

morganbr commented 6 years ago

@karelz, might you be able to help @smitpatel with package/version info?

karelz commented 6 years ago

I would try daily build of .NET Core - I'd wait until tomorrow. I don't think there is a package you can simply take dependency on.

dotnet / runtime

Proposal: Add System.HashCode to make it easier to generate good hash codes. #19621

Update 6/16/17: Looking for volunteers

Update 6/13/17: Proposal accepted!

Rationale

Proposal

Remarks

Edit