Closed jamesqo closed 4 years ago
Proposal: add hash randomization support
public static HashCode Randomized<T> { get; } // or CreateRandomized<T>
or
public static HashCode Randomized(Type type); // or CreateRandomized(Type type)
T
or Type type
is needed to get the same randomized hash for the same type.
Proposal: add support for collections
public HashCode Combine<T>(T[] values);
public HashCode Combine<T>(T[] values, IEqualityComparer<T> comparer);
public HashCode Combine<T>(Span<T> values);
public HashCode Combine<T>(Span<T> values, IEqualityComparer<T> comparer);
public HashCode Combine<T>(IEnumerable<T> values);
public HashCode Combine<T>(IEnumerable<T> IEqualityComparer<T> comparer);
I think there is no need in overloads Combine(_field1, _field2, _field3, _field4, _field5)
because next code HashCode.Empty.Combine(_field1).Combine(_field2).Combine(_field3).Combine(_field4).Combine(_field5);
should be inline optimized without Combine calls.
@AlexRadch
Proposal: add support for collections
Yes, that was part of my eventual plan for this proposal. I think it's important to focus on how we want the API to look like before we go about adding those methods, though.
He wanted to use a different algorithm, like the Marvin32 hash which is used for strings in coreclr. This would require expanding the size of HashCode to 8 bytes.
What about having Hash32 and Hash64 types that would internally store 4 or 8 bytes worth of data? Document the pros/cons of each. Hash64 being good for X, but being potentially slower. Hash32 being faster, but potentially not as distributed (or whatever the tradeoff actually is).
He wanted to randomize the hash seed, so hashes would not be deterministic.
This seems like useful behavior. But i could see people wanting to control this. So perhaps there should be two ways to create the Hash, one that takes no seed (and uses a random seed) and one that allows the seed to be provided.
Note: Roslyn would love if this could be provided in the Fx. We're adding a feature to spit out a GetHashCode for the user. Currently, it generates code like:
public override int GetHashCode()
{
var hashCode = -1923861349;
hashCode = hashCode * -1521134295 + this.b.GetHashCode();
hashCode = hashCode * -1521134295 + this.i.GetHashCode();
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(this.s);
return hashCode;
}
This is not a great experience, and it exposes many ugly concepts. We would be thrilled to have a Hash.Whatever API that we could call through instead.
Thanks!
What about MurmurHash? It is reasonably fast and has very good hashing properties. There is also two different implementations, one that spits out 32-bit hashes and another that spits out 128-bit hashes.
There is also vectorized implementations for both the 32-bit.and 128-bit formats.
@tannergooding MurmurHash is fast, but not secure, from the sounds of this blog post.
@jkotas, has there been any work in the JIT around generating better code for >4-byte structs on 32-bit since our discussions last year? Also, what do you think of @CyrusNajmabadi's proposal:
What about having Hash32 and Hash64 types that would internally store 4 or 8 bytes worth of data? Document the pros/cons of each. Hash64 being good for X, but being potentially slower. Hash32 being faster, but potentially not as distributed (or whatever the tradeoff actually is).
I still think this type would be very valuable to offer to developers and it would be great to have it in 2.0.
@jamesqo, I don't think this implementation needs to be cryptographically secure (that is the purpose of the explicit cryptographically hashing functions).
Also, that article applies to Murmur2. The issue has been resolved in the Murmur3 algorithm.
the JIT around generating better code for >4-byte structs on 32-bit since our discussions last year
I am not aware of any.
what do you think of @CyrusNajmabadi's proposal
The framework types should be simple choices that work well for 95%+ of cases. They may not be the fastest ones, but that's fine. Having you to choose between Hash32 and Hash64 is not a simple choice.
That's fine with me. But can we at least have a good-enough solution for those 95% cases? Right now there's nothing... :-/
hashCode = hashCode * -1521134295 + EqualityComparer
.Default.GetHashCode(this.s);
@CyrusNajmabadi Why are you calling EqualityComparer here, and not just this.s.GetHashCode()?
For non-structs: so that we don't need to check for null.
This is close to what we generate for anonymous types behind the scenes as well. I optimize the case of known non-null values to generate code that would be more pleasing to users. But it would be nice to just have a built in API for this.
The call to EqualityComparer.Default.GetHashCode is like 10x+ more expensive than check for null... .
The call to EqualityComparer.Default.GetHashCode is like 10x+ more expensive than check for null..
Sounds like a problem. if only there were good hash code API we could call in the Fx that i could defer to :)
(also, we have that problem then in our anonymous types as that's what we generate there as well).
Not sure what we do for tuples, but i'm guessing it's similar.
Not sure what we do for tuples, but i'm guessing it's similar.
System.Tuple
goes through EqualityComparer<Object>.Default
for historic reasons. System.ValueTuple
calls Object.GetHashCode with null check - https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/ValueTuple.cs#L809.
Oh no. Looks like tuple can just use "HashHelpers". Could that be exposed so that users can get the same benefit?
Great. I'm happy to do something similar. I started from our anonymous types because i figured they were reasonable best practices. If not, that's fine. :)
But that's not why i'm here. I'm here to get some system that actually combines the hashes effectively. If/when that can be provided we'll gladly move to calling into that instead of hardcoding in random numbers and combining hash values ourselves.
What would be the API shape that you think would work best for the compiler generated code?
Literally any of the 32bit solutions that were presented earlier would be fine with me. Heck, 64bit solutions are fine with me. Just some sort of API that you can get that says "i can combine hashes in some sort of reasonable fashion and produce a reasonably distributed result".
I can't reconcile these statements:
We had an immutable HashCode struct that was 4 bytes in size. It had a Combine(int) method, which mixed in the provided hash code with its own hash code via a DJBX33X-like algorithm, and returned a new HashCode.
@jkotas did not think the DJBX33X-like algorithm was robust enough.
And
The framework types should be simple choices that work well for 95%+ of cases.
Can we not come up with a simple 32bit accumulating hash that works well enough for 95% of cases? What are the cases that aren't handled well here, and why do we think they're in the 95% case?
@jkotas, is performance really that critical for this type? I think on average things like hashtable lookups and this would take up way more time than a few struct copies. If it does turn out to be a bottleneck, would it be reasonable to ask the JIT team to optimize 32-bit struct copies after the API is released so they have some incentive, rather than blocking this API on that when nobody is working on optimizing copies?
Can we not come up with a simple 32bit accumulating hash that works well enough for 95% of cases?
We have been burnt really badly by default 32bit accumulating hash for strings, and that's why Marvin hash for strings in .NET Core - https://github.com/dotnet/corert/blob/87e58839d6629b5f90777f886a2f52d7a99c076f/src/System.Private.CoreLib/src/System/Marvin.cs#L25. I do not think we want to repeat same mistake here.
@jkotas, is performance really that critical for this type?
I do not think the performance is critical. Since it looks like that this API is going to be used by auto-generated compiler code, I think we should be preferring smaller generated code over how it looks. The non-fluent pattern is smaller code.
We have been burnt really badly by default 32bit accumulating hash for string
That doesn't seem like the 95% case. We're talking about normal developers just wanting a "good enough" hash for all those types where they manually do things today.
Since it looks like that this API is going to be used by auto-generated compiler code, I think we should be preferring smaller generated code over how it looks. The non-fluent pattern is smaller code.
This is not for use by the Roslyn compiler. This is for use by the Roslyn IDE when we help users generate GetHashCodes for their types. THis is code that the user will see and have to maintain, and having something sensible like:
return Hash.Combine(this.A?.GetHashCode() ?? 0,
this.B?.GetHashCode() ?? 0,
this.C?.GetHashCode() ?? 0);
is a lot nicer than a user seeing and having to maintain:
var hashCode = -1923861349;
hashCode = hashCode * -1521134295 + this.b.GetHashCode();
hashCode = hashCode * -1521134295 + this.i.GetHashCode();
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(this.s);
return hashCode;
I mean, we already have this code in the Fx:
We think it's good enough for tuples. It's unclear to me why it would be such a problem to make it available for users who want it for their own types.
Note: we've even considered doing this in roslyn:
return (this.A, this.B, this.C).GetHashCode();
But now you're forcing people to generate a (potentially large) struct just to get some sort of reasonable default hashing behavior.
We're talking about normal developers just wanting a "good enough" hash for all those types where they manually do things today.
The original string hash was a "good enough" hash that worked well for normal developers. But then it was discovered that ASP.NET webservers were vulnerable to DoS attacks because they tend to store received stuff in hashtables. So the "good enough" hash basically turned into a bad security issue.
We think it's good enough for tuples
No necessarily. We made a back stop measure for tuples to make the hashcode randomized that gives us option to modify the algorithm later.
return Hash.Combine(this.A?.GetHashCode() ?? 0, this.B?.GetHashCode() ?? 0, this.C?.GetHashCode() ?? 0);
This looks reasonable to me.
I don't get your positoin. You seem to be saying two things:
The original string hash was a "good enough" hash that worked well for normal developers. But then it was discovered that ASP.NET webservers were vulnerable to DoS attacks because they tend to store received stuff in hashtables. So the "good enough" hash basically turned into a bad security issue.
Ok, if that's the case, then let's provide a hash code that's good for people who have security/DoS concerns.
The framework types should be simple choices that work well for 95%+ of cases.
Ok, if that's the case, then let's provide a hash code that's good enough for the 95% of cases. People who have security/DoS concerns can use the specialized forms that are documented for that purpose.
No necessarily. We made a back stop measure for tuples to make the hashcode randomized that gives us option to modify the algorithm later.
Ok. Can we expose that so that users can use that same mechanism.
-- I'm really struggling here because it sounds like we're saying "because we can't make a universal solution, everyone has to roll their own". That seems like one of hte worst places to be in. Because certainly most of our customers aren't thinking about rolling their own 'marvin hash' for DoS concerns. They're just adding, xoring, or otherwise poorly combining field hashes into one final hash.
If we care about the 95% case, then we should just make a generally good enogh hash. IF we care about the 5% case, we can supply a specialized solution for that.
This looks reasonable to me.
Great :) Can we then expose:
namespace System.Numerics.Hashing
{
internal static class HashHelpers
{
public static readonly int RandomSeed = new Random().Next(Int32.MinValue, Int32.MaxValue);
public static int Combine(int h1, int h2)
{
// RyuJIT optimizes this to use the ROL instruction
// Related GitHub pull request: dotnet/coreclr#1830
uint rol5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
return ((int)rol5 + h1) ^ h2;
}
}
Roslyn could then generate:
return Hash.Combine(Hash.RandomSeed,
this.A?.GetHashCode() ?? 0,
this.B?.GetHashCode() ?? 0,
this.C?.GetHashCode() ?? 0);
This would have the benefit of really being "good enough" for the vast majority of cases, while also leading people down the good path of initializing with random values so they don't take dependencies on non-random hashes.
People who have security/DoS concerns can use the specialized forms that are documented for that purpose.
Every ASP.NET app has security/DoS concern.
Great :) Can we then expose:
This is different from what I have said is reasonable.
What do you think about https://github.com/aspnet/Common/blob/dev/shared/Microsoft.Extensions.HashCodeCombiner.Sources/HashCodeCombiner.cs . It is what is used in ASP.NET internally in number of places today, and it is what I would be pretty happy with (except that the combining function needs to be stronger - implementation detail that we can keep tweaking).
@jkotas I heard that :p
So the problem here is developers don't know when they're susceptible to DoS attacks, because it's not something they thing about it, which is why we switched strings to use Marvin32.
We should not head down the route of saying "95% of the cases don't matter", because we have no way to prove that, and we must err on the side of caution even when it has a performance cost. If you're going to move away from that then the hash code implementation needs Crypto Board review, not just us deciding "This looks good enough".
Every ASP.NET app has security/DoS concern.
Ok. So how are you dealing with teh issue today that no one has any help with hashcodes, and thus is likely doing things poorly? Clearly it's been acceptable to have that state of the world. So what is harmed by providing a reasonable hashing system that likely performs better than what people are hand rolling today?
because we have no way to prove that, and we must err on the side of caution even when it has a performance cost
If you don't provide something, people will continue to just do things badly. The rejection of the "good enough" because there's nothing perfect just means the poor status quo we have today.
Every ASP.NET app has security/DoS concern.
Can you explain this? As i understand it, you have a DoS concern if you're accepting arbitrary input and then storing it in some data structure that performs poorly if the inputs can be specially crafted. Ok, i get how that's a concern with the strings one gets in web scenarios that have come from the user.
So how does that apply to the remainder of types out there that are not being used in this scenario?
We have these sets of types:
Basically, we think these cases are important, but not important enough to actually provide a solution to users to handle '1' or '2'. Because we're worried a solution for '2' won't be good for '1' we won't even provide it in the first place. And if we're not willing to even provide a solution for '1' it feels like we're in an incredibly strange position. We're worried about DoSing and ASP, but not worried enogh to actually help people. And because we won't help people with that, we're not even willing to help then with the non-DoS cases.
--
If these two cases are important (which i'm willing to accept) then why not just give two APIs? Document them. Make them clear what they're for. If people use them properly, great. If people don't use them properly that's still fine. After all, they're likely not doing things properly today anyways, so how are things any worse?
What do you think about
I have no opinion one way or the other. If it's an API that customers can use which performs acceptably and which provides a simple API with clear code on their end, then i think that's fine.
I think it would be nice to have a simple static form that handles the 99% case of wanting to combine a set of fields/properties in an ordered fashion. It seems like such a thing could be added to this type fairly simply.
I think it would be nice to have a simple static form
Agree.
I think it would be nice to have a simple static form that handles the 99% case of wanting to combine a set of fields/properties in an ordered fashion. It seems like such a thing could be added to this type fairly simply.
Agree.
I am willing to meet you both halfway on this one because I really want to see some sort of API come through. @jkotas I still do not understand you're opposed to adding a immutable instance-based API; first you said it was because 32-bit copies would be slow, then because the mutable API would be more terse (which is not true; h.Combine(a).Combine(b)
(immutable version) is shorter than h.Combine(a); h.Combine(b);
(mutable version)).
That said, I'm willing to go back to:
public static class HashCode
{
public static int Combine<T>(T value1, Tvalue2);
public static int Combine<T>(T value1, Tvalue2, IEqualityComparer<T> comparer);
public static int Combine<T>(T value1, Tvalue2, T value3);
public static int Combine<T>(T value1, Tvalue2, T value3, IEqualityComparer<T> comparer);
public static int Combine<T>(T value1, Tvalue2, T value3, T value4);
public static int Combine<T>(T value1, Tvalue2, T value3, T value4, IEqualityComparer<T> comparer);
// ... All the way until value8
}
Does this seem reasonable?
I can't edit my post right now, but I just realized not all methods can accept T. In that case, we can just have 8 overloads accepting all ints and force the user to call GetHashCode.
If these two cases are important (which i'm willing to accept) then why not just give two APIs? Document them. Make them clear what they're for. If people use them properly, great. If people don't use them properly that's still fine. After all, they're likely not doing things properly today anyways, so how are things any worse?
Because people don't use things properly when they're there. Let's take a simple example, XSS. From the beginning even web forms had the ability to HTML encode output. However developers didn't know the risk, didn't know how to do it properly, and only found out when it was too late, their app was published, and oops, now their auth cookie has been lifted.
Giving people a security choice assumes they
Those assumptions don't generally hold for the majority of developers, they only find out about the problem when it's too late. Developers don't go to security conferences, they don't read white papers and they don't understand the solutions. So in the ASP.NET HashDoS scenario we made the choice for them, we protected them by default, because that was the right thing to do, and had the greatest impact. However we only applied it to strings, and that left people who were constructing custom classes from user input in a bad place. We should do the right thing, and help protect those customers now, and make it the default, having a pit of success, not failure. API design for security is sometimes not about choice, but about helping the user whether they know it or not.
A user can always create a non-security focused hash; so given the two options
Then the second is probably better; and what's suggested wouldn't have the perf impact of a full on crypto hash; so it makes a good compromise?
One of the running questions in these threads has been which algorithm is perfect for everybody. I think it's safe to say there isn't a single perfect algorithm. However, I don't think that should stop us from providing something better than code like what @CyrusNajmabadi has shown, which tends to have poor entropy for common .NET inputs as well as other common hasher bugs (like losing input data or being easily resettable).
I'd like to propose a couple of options to get around the "best algorithm" problem:
Explicit Choices: I'm planning to send out an API proposal soonish for a suite of non-cryptographic hashes (perhaps xxHash, Marvin32, and SpookyHash for example). Such an API has slightly different usage than a HashCode or HashCodeHelper type, but for the sake of discussion, assume we can work out those differences. If we use that API for GetHashCode:
Marvin32.Create();
, it lets power users know what it decided to do and they can easily change it to another algorithm in the suite if they like.Randomization: Start with a properly randomized algorithm (the code @CyrusNajmabadi showed with a random initial value doesn't count since it's likely possible to wash out the randomness). This ensures that we can change the implementation with no compatibility issues. We would still need to be very sensitive about performance changes if we change the algorithm. However that would also be a potential upside as we could make per-architecture (or even per-device) choices. For example, this site shows that xxHash is fastest on an x64 Mac while SpookyHash is fastest on Xbox and iPhone. If we do go down this route with an intent to change algorithms at some point, we may need to think about designing an API that still has reasonable performance if there is 64+ bit internal state.
CC @bartonjs, @terrajobst
@morganbr There isn't a single perfect algorithm, but I think that having some algorithm, which works fairly well most of the time, exposed using a simple, easy to understand API is the most useful thing that can be done. Having a suite of algorithms in addition to that, for advanced uses is fine. But it shouldn't be the only option, I shouldn't have to learn who Marvin is just so that I can put my objects into a Dictionary
.
I shouldn't have to learn who Marvin is just so that I can put my objects into a Dictionary.
I like the way you put that. I also like that you mentioned Dictionary itself. IDictionary is something that can have tons of different impls with all sorts of differing qualities (see the collections APIs in many platforms). However, we still just provide a base 'Dictionary' that does a decent job overall, even though it may not excel in every category.
I think that's what a ton of people are looking for in a hashing library. Something that gets the job done, even if it is not perfect for every purpose.
@morganbr I think people simple want a way to write GetHashCode that is better than what they're doing today (usually some grabbag combination of math operations they copied from something on the web). If you can just provide a basic impl of that that runes well, then people will be happy. You can then have a behind-the-scenes API for advanced users if they have a strong need for specific hashing functions.
In other words, people writing hashcodes today aren't going to know or care why they would want Spooky vs Marvin vs Murmur. Only someone who has a particular need for one of those specific hash codes would go looking. But lots of people have a need to say "here's the state of my object, provide me a way to produce a well distributed hash that is fast that i can then use with dictionaries, and which i guess prevents me from being DOSed if i happen to take untrusted input and hash it and store it".
@CyrusNajmabadi The problem is that if we extend our current notions of compatibility into the future we find that once this type ships it can't ever change (unless we find that the algorithm is horribly broken in an "it makes all applications attackable" manner).
Once can argue that if it starts off as a stable-randomized manner that it becomes easy to change the implementation, since you couldn't depend on the value from run to run anyways. But if a couple of years later we find that there's an algorithm that provides as-good-if-not-better balancing of hash buckets with better-in-the-general case performance, but makes a structure involving a List\<string> of 1000 or more members where each member is over 900 characters long get significantly worse, we probably won't make the change... even though it would on the net (of all programs ever run) reduce the number of CPU-hours spent hashing.
Under Morgan's suggestion is that the code that you write today will have effectively the same performance characteristics forever. For the applications which could have gotten better, this is unfortunate. For the applications which would have gotten worse, this is fantastic. But when we find the new algorithm we get it checked in, and we change Roslyn (and suggest a change to ReSharper/etc) to start generating things with NewAwesomeThing2019 instead of SomeThingThatWasConsideredAwesomeIn2018.
Anything super black box like this only ever gets to be done once. And then we're stuck with it forever. Then someone writes the next one, which has better average performance, so there are two black box implementations that you don't know why you'd choose between them. And then... and then....
So, sure, you may not know why Roslyn/ReSharper/etc auto-wrote GetHashCode for you using Marvin32, or Murmur, or FastHash, or a combination/conditional based on IntPtr.Size. But you have the power to look into it. And you have the power to change it on your types later, as new information is revealed... but we've also given you the power to keep it the same. (It'd be sad if we write this, and in 3 years Roslyn/ReSharper/etc are explicitly avoiding calling it, because the new algorithm is So Much Better... Usually).
@bartonjs What makes hashing different from all the places where .Net provides you with black box algorithm or data structure? For example, sorting (introsort), Dictionary
(array-based separate chaining), StringBuilder
(linked list of 8k chunks), most of LINQ.
We've taken a deeper look at this today. Apologies for the delay and the back and forth on this issue.
// Will live in the core assembly
// .NET Framework : mscorlib
// .NET Core : System.Runtime / System.Private.CoreLib
namespace System
{
public struct HashCode
{
public static int Combine<T1>(T1 value1);
public static int Combine<T1, T2>(T1 value1, T2 value2);
public static int Combine<T1, T2, T3>(T1 value1, T2 value2, T3 value3);
public static int Combine<T1, T2, T3, T4>(T1 value1, T2 value2, T3 value3, T4 value4);
public static int Combine<T1, T2, T3, T4, T5>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5);
public static int Combine<T1, T2, T3, T4, T5, T6>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6);
public static int Combine<T1, T2, T3, T4, T5, T6, T7>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7);
public static int Combine<T1, T2, T3, T4, T5, T6, T7, T8>(T1 value1, T2 value2, T3 value3, T4 value4, T5 value5, T6 value6, T7 value7, T8 value8);
public void Add<T>(T value);
public void Add<T>(T value, IEqualityComparer<T> comparer);
public void Add<T>(T[] value);
public void Add<T>(T[] value, int index, int length);
public void Add(byte[] value);
public void Add(byte[] value, int index, int length);
public void Add(string value);
public void Add(string value, StringComparison comparisonType);
public int ToHashCode();
}
}
Notes:
GetHashCode()
to produce the hash code as this would be weird, both naming-wise as well as from a behavioral standpoint (GetHashCode()
should return the object's hash code, not the one being computed).Add
for the builder patter and Combine
for the static constructionAdd
will do this on first use.GetHashCode()
very cheap & not cause any allocations while allowing the structure to be bigger than 32-bit so that the hash code algorithm can use more bits during accumulation.Combine
will just call <value>.GetHashCode()
, so it has the behavior of the value's type GetHashCode()
implementation
The simple case is when someone just wants to produce a good hash code for a given type, like so:
public class Customer
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public override int GetHashCode() => HashCode.Combine(Id, FirstName, LastName);
}
The more complicated case is when the developer needs to tweak how the hash is being computed. The idea is that the call site passes the desired hash rather then the object/value, like so:
public partial class Customer
{
public override int GetHashCode() =>
HashCode.Combine(
Id,
StringComparer.OrdinalIgnoreCase.GetHashCode(FirstName),
StringComparer.OrdinalIgnoreCase.GetHashCode(LastName),
);
}
And lastly, if the developer needs more flexibility, such as producing a hash code for more than eight values, we also provide a builder-style approach:
public partial class Customer
{
public override int GetHashCode()
{
var hashCode = new HashCode();
hashCode.Add(Id);
hashCode.Add(FirstName, StringComparison.OrdinalIgnoreCase);
hashCode.Add(LastName, StringComparison.OrdinalIgnoreCase);
return hashCode.ToHashCode();
}
}
This issue will remain up for grabs. In order to implement the API we need to decide which algorithm to use.
@morganbr will make a proposal for good candidates. Generally speaking, we don't want to write a hashing algorithm from scratch -- we want to use a well-known one whose properties are well-understood.
However, we should measure the implementation for typical .NET workloads and see which algorithm produces good results (throughput and distribution). It's likely that the answers will differ by CPU architecture, so we should consider this when measuring.
@jamesqo, are you still interested on working in this area? In that case, please update the proposal accordingly.
@terrajobst , we might also want public static int Combine<T1>(T1 value);
. I know it looks a little funny, but it would provide a way of diffusing bits from something with a limited input hash space. For example, many enums only have a few possible hashes, only using the bottom few bits of the code. Some collections are built on the assumption that hashes are spread over a larger space, so diffusing the bits may help the collection work more efficiently.
public void Add(string value, StrinComparison comparison);
Nit: The StringComparison
parameter should be named comparisonType
to match the naming used everywhere else StringComparison
is used as a parameter.
Update 6/16/17: Looking for volunteers
The API shape has been finalized. However, we're still deciding on the best hash algorithm out of a list of candidates to use for the implementation, and we need someone to help us measure the throughput/distribution of each algorithm. If you'd like to take that role up, please leave a comment below and @karelz will assign this issue to you.
Update 6/13/17: Proposal accepted!
Here's the API that was approved by @terrajobst at https://github.com/dotnet/corefx/issues/14354#issuecomment-308190321:
The original text of this proposal follows.
Rationale
Generating a good hash code should not require use of ugly magic constants and bit twiddling on our code. It should be less tempting to write a bad-but-concise
GetHashCode
implementation such asProposal
We should add a
HashCode
type to enscapulate hash code creation and avoid forcing devs to get mixed up in the messy details. Here is my proposal, which is based off of https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329, with a few minor revisions.Remarks
See @terrajobst's comment at https://github.com/dotnet/corefx/issues/14354#issuecomment-305019329 for the goals of this API; all of his remarks are valid. I would like to point out these ones in particular, however: