dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Remove [DisallowNull] from EqualityComparer.GetHashCode #30998

Closed jnm2 closed 4 years ago

jnm2 commented 5 years ago

Visual Studio 16.3.1 generates code like this for GetHashCode specifically because this way the null check is handled. EqualityComparer.GetHashCode is well-known to return 0 if you pass null:

struct Foo
{
    public string? Bar { get; }

    public override int GetHashCode()
    {
        // CS8604 Possible null reference argument for parameter 'obj' in
        // 'int EqualityComparer<string?>.GetHashCode(string? obj)'.      ↓
        return 1739646154 + EqualityComparer<string?>.Default.GetHashCode(Bar);
    }
}

Rather than generating a ! to go along with it, could [DisallowNull] be removed from EqualityComparer<>.GetHashCode as a kind of contravariance, even though IEqualityComparer<>.GetHashCode has it? It would reflect the true behavior of the EqualityComparer<> class.

jnm2 commented 5 years ago

I missed that it's only EqualityComparer<>.Default that has this property of not throwing when getting a hash code for null and that EqualityComparer.GetHashCode is documented to throw for null in general. Thanks @sharwell!

jnm2 commented 5 years ago

Reopening since @CyrusNajmabadi believes that the virtual EqualityComparer<>.GetHashCode class method should allow null and the docs should change, and I'm quite partial to this idea. See https://github.com/dotnet/roslyn/issues/37913#issuecomment-536084419.

CyrusNajmabadi commented 5 years ago

Looking into it, it looks like this is wrongly annotated in the BCL. This instance very specifically and intentionally does not throw. This is baked through so thoroughly that the C# compiler takes a hard dependency on the existence of this behavior when generating teh code for things like anonymous-types. i.e. it will make direct calls (without null checks) into EqualityComparer<whatever>.Default.Equals/GetHashCode precisely to be able to avoid synthesizing null checks and to benefit from the appropriate null-behavior this impl has.

It seems wrong to have this extraordinarily common value be null-tolerant, but then mark the api entrypoint into it as being null-intolerant. This forces code to have to suppress or null-check to avoid C#'s warnings, even though it's expected idiomatic behavior around using this API.

sharwell commented 5 years ago

I was under the impression that the type would be this:

public interface IEqualityComparer<in T>
{
  bool Equals(T x, T y);
  int GetHashCode(T obj);
}

With EqualityComparer<T> matching that. Roslyn assumes this is the behavior in many places, but we don't have any build-time validation because there are no annotations in .NET Standard. It's leading to increasing divergence between how we are using the framework, and how the framework is being defined.

scalablecory commented 5 years ago

Both EqualityComparer<> and IEqualityComparer<> docs state that they throw on null. It'd be a breaking change to remove that expectation, so I don't think this is a good change.

That C# and maybe parts of BCL rely on an implementation detail for EqualityComparer.Default should not dictate the annotation.

CyrusNajmabadi commented 5 years ago

It'd be a breaking change to remove that expectation

Wait... how? It's not a breaking change ot state that something that threw before won't throw anymore.

sharwell commented 5 years ago

@scalablecory I'm having a hard time finding implementations of this interface which throw on GetHashCode but do not throw on Equals.

CyrusNajmabadi commented 5 years ago

That C# and maybe parts of BCL rely on an implementation detail for EqualityComparer.Default should not dictate the annotation.

Yes it should. This feature is about ensuring that appropriate coding patterns aren't being unnecessarily penalized by incorrect annotations. Annotations should reflect the actual state of the world, so that code that is written properly isn't told it is incorrect and forced to change.

--

Note: we've had to do this in roslyn itself in a few places. There are areas where we'd like to annotate a certain way, but we found out that annotation isn't correct.

jnm2 commented 5 years ago

I was under the impression that the type would be this:

Me too. Both the [AllowNull] and [DisallowNull] overrides surprised me in the current definition:

https://github.com/dotnet/corefx/blob/e667c29636a622eb4f9493f75232b44e0ae90b29/src/Common/src/CoreLib/System/Collections/Generic/IEqualityComparer.cs#L12-L16

scalablecory commented 5 years ago

Annotations should reflect the actual state of the world, so that code that is written properly isn't told it is incorrect and forced to change.

You don't know the actual state of the world here because EqualityComparer<>.GetHashCode() is virtual. You're changing the contract for anyone who has implemented the class.

CyrusNajmabadi commented 5 years ago

You don't know the actual state of the world here because EqualityComparer<>.GetHashCode() is virtual.

We can err on the side of looking at our own most common impl that is used and depended on heavily.

Since this is a relaxing and not a restricting choice, i see no problem there.

--

I would say that those that want to restrict against the most common known form here should provide the argument why that would be desirable given how it hurts those upgrading to NRT whose code is totally fine.

You're changing the contract for anyone who has implemented the class.

No. I'm stating that EqualityComparer should say nothing at the NRT level about this one way or the other. If a user says EqualityComparer<string?> they should be allowed to pass in string?s. If they say EqualityComparer<string> then they can't pass in a string?. This accurately conveys how this API actually works and leaves it in control of the consumer of the API to specify the behavior they expect this to have.

CyrusNajmabadi commented 5 years ago

This bit is especially relevant:

With EqualityComparer<T> matching that. Roslyn assumes this is the behavior in many places, but we don't have any build-time validation because there are no annotations in .NET Standard. It's leading to increasing divergence between how we are using the framework, and how the framework is being defined.

This case is also a bit strange for me because i have no idea how these annoations could have been added in Core without causing Core itself to break (i.e. we must at least have tests that validate this behavior). How did they not trigger any issues here?

jnm2 commented 5 years ago

@CyrusNajmabadi

Since this is a relaxing and not a restricting choice, i see no problem there.

It's of course a new demand on implementers of the class to not throw if someone passes null.

@scalablecory Under what circumstances would someone implement the EqualityComparer class? It seems like implementers would be very rare, and consumers of anything but EqualityComparer<>.Default (or direct IEqualityComparer interface implementations) just as rare.

Then, what are the consequences for the class implementors if [DisallowNull] is removed?

safern commented 5 years ago

How did they not trigger any issues here?

I suspect that is because we didn't enable nullable annotations on test assemblies yet.

After looking at implementations in the framework of IEqualityComparer and derived types of EqualityComparer it seems like we never throw on null, we always return 0. So I think we might have annotated it that way because we followed the documentation and the intent of the API... I'm also inclining towards we should remove the DisallowNull and that EqualityComparer.GetHashCode(object) should accept null and suggest to return 0 on null. @stephentoub, thoughts?

jnm2 commented 5 years ago

If GetHashCode had [AllowNull], it would be possible to do EqualityComparer<T>.Default.GetHashCode(default) when T is unconstrained.

scalablecory commented 5 years ago

This does seem to be one of the safer places to change documented behavior:

So, I'm not strongly against this change.

What are the benefits of removing this annotation? What is currently being blocked or made inconvenient by it?

CyrusNajmabadi commented 5 years ago

What are the benefits of removing this annotation? What is currently being blocked or made inconvenient by it?

See https://github.com/dotnet/roslyn/issues/37913#issuecomment-536084419

jnm2 commented 5 years ago

@scalablecory The inconvenience is listed at the start of this thread. VS is generating GetHashCode in a way that causes a warning:

struct Foo
{
    public string? Bar { get; }

    public override int GetHashCode()
    {
        // CS8604 Possible null reference argument for parameter 'obj' in
        // 'int EqualityComparer<string?>.GetHashCode(string? obj)'.      ↓
        return 1739646154 + EqualityComparer<string?>.Default.GetHashCode(Bar);
    }
}

And none of us want it to have to start adding ! to that.

safern commented 5 years ago

What are the benefits of removing this annotation? What is currently being blocked or made inconvenient by it?

When using the Default comparer GetHashCode with a nullable variable it would generate warnings that people will just en up adding a ! because the Default comparers indeed handle null and return 0.

scalablecory commented 5 years ago

VS is generating GetHashCode in a way that causes a warning:

Presumably this isn't an issue in latest VS, which should be using HashCode.Combine, but I see your point. I wonder how frequently people use this feature.

When using the Default comparer GetHashCode with a nullable variable it would generate warnings that people will just en up adding a ! because the Default comparers indeed handle null and return 0.

Well, yea, that is the obvious use case here. My question is more to understand how common this pattern actually is, to lend some additional justification to this change -- even if it doesn't reflect reality, anyone doing this right now is explicitly going against documented behavior, so it might be rare. Or it might not... lots of people don't read docs.

safern commented 5 years ago

I did a little bit more digging and there are actually some public comparers in the framework which throw when a null is passed down.

https://source.dot.net/#System.Private.CoreLib/shared/System/StringComparer.cs,178

Which makes me think that we should base this decision more on, what's the common case, what's the recommended usage of GetHashCode(T) and what's the intent of that API when a null is passed down. I think we might need to stick with what we have been recommending all these years and we also need to be consistent.

With object.ToString() which is a virtual, we decided to make it return nullable, even though, it is documented that we don't recommend returning null, but we even did in some places in the framework, and there are so many places out there which return null.

So in this case, since it is virtual and an interface for IEqualityComparer<T>, I think we should DisallowNull as some implementers might throw (such as the example I pasted above) and it is called through the interface or virtual method where we could be giving a false positive.

jnm2 commented 5 years ago

@scalablecory

Presumably this isn't an issue in latest VS, which should be using HashCode.Combine, but I see your point. I wonder how frequently people use this feature.

Someone noticed :D This is latest VS but targeting .NET Standard and .NET Framework where HashCode is not available, and using @sharwell's excellent https://github.com/tunnelvisionlabs/ReferenceAssemblyAnnotator to borrow .NET Core's annotations into the matching APIs in the .NET Standard and .NET Framework reference assemblies. So effectively for this question it is the same as if I was using a .NET Core 3.0 that didn't have HashCode.

stephentoub commented 5 years ago

I am against removing the [DisallowNull].

This is a virtual method that has existed in the platform for years, and its contract is explicitly documented to be that nulls can throw; we do not have the right to pretend that's not the case. On top of that, we ourselves ship types that throw for null, not only in the framework, but in other libraries as well (a quick search of Roslyn shows at least a handful of GetHashCodes that will null ref if passed a null). Same for ASP.NET. And I'm 100% sure the same for 3rd-party libraries.

Whether we like it or not, this is the contract.

jnm2 commented 5 years ago

@safern There's an up-for-grabs issue to make StringComparer return 0 rather than throwing: https://github.com/dotnet/corefx/issues/28370

CyrusNajmabadi commented 5 years ago

I did a little bit more digging and there are actually some public comparers in the framework which throw when a null is passed down.

That's not an EqualityComparer, it's an IEqualityComparer.

CyrusNajmabadi commented 5 years ago

Whether we like it or not, this is the contract.

If it's the contract, then shouldn't BCL move to make this API throw to match the specified contract?

stephentoub commented 5 years ago

If it's the contract, then shouldn't BCL move to make this API throw to match the specified contract?

It's allowed to throw. It doesn't have to throw.

CyrusNajmabadi commented 5 years ago

It's allowed to throw. It doesn't have to throw.

Ok. so if it does'nt have to throw, we shouldn't try to prevent people from passing in null... :)

CyrusNajmabadi commented 5 years ago

Basically, the contract seems to be: the impl may or may not throw. As such, since there's no knowledge of what the impl can do, why are we annotated, which is implying what the impl will do?

stephentoub commented 5 years ago

Ok. so if it does'nt have to throw, we shouldn't try to prevent people from passing in null...

No one is prevented from doing so. If you know it's safe based on the concrete type you're working with, you can suppress the warning that it might not be safe to do so, whether with a ! or a pragma or your mechanism of choice.

jnm2 commented 5 years ago

@CyrusNajmabadi The function of [DisallowNull] is to say "we can't prove this is safe," not to prevent it. (Naming is hard)

stephentoub commented 5 years ago

Basically, the contract seems to be: the impl may or may not throw. As such, since there's no knowledge of what the impl can do, why are we annotated, which is implying what the impl will do?

The intent is "do not pass null, you can't expect it to work". If some implementations happen to not check or happen to work, bully for them, and feel free to ignore the warnings and call the method accordingly.

CyrusNajmabadi commented 5 years ago

Presumably this isn't an issue in latest VS, which should be using HashCode.Combine, but I see your point. I wonder how frequently people use this feature.

Note: this also affects all the correct code out there which is already using this pattern.

@CyrusNajmabadi The function of [DisallowNull] is to say "we can't prove this is safe," not to prevent it.

THe function of NRT is to give me a good set of true-positives about areas i should be concerned with. Letting me know that all my GetHashCodes are suspect, even though none of them, is not helpful. It's just noise.

stephentoub commented 5 years ago

THe function of NRT is to give me a good set of true-positives about areas i should be concerned with.

There are going to be false positives and there are going to be false negatives. That's just the way it is. We try to minimize them, but not at the expense of inaccuracy. When we can claim that the intent was always to allow nulls, then we can annotate it as such and treat "bugs" as such and fix them... the intent here is explicitly called out, though: nulls may not be accepted, don't pass null.

CyrusNajmabadi commented 5 years ago

The intent is "do not pass null, you can't expect it to work".

But we can expect this case to work. It has worked since the dawn of time. This is not IEqualityComparer. This is EqualityComparer<T>, which was produced and messaged for this purpose, even to the point that it's codified with how hte compiler emits its own code.

If some implementations happen to not check or happen to work, bully for them, and feel free to ignore the warnings and call the method accordingly.

...

if some impls don't throw... then why not mark those impls as being non-throwing? EqualityComparer<T>.Default doesn't give you an IEqualityComparer back, it gives and EqualityComparer back. Are there even other EqualityComparers that we produce? if so, what behavior do we give them?

CyrusNajmabadi commented 5 years ago

There are going to be false positives and there are going to be false negatives. The feature is far from perfect.

Yes. And, as we discussed even in the early days of NRT we should annotate in the manner that fits the majority case. i.e. if the annotation would be incorrect 99.9% of the time, but correct in very unlikely cases, it's better to not put that annotation. That's inherent with this being heuristicy and best-effort anyways. Since it's never going to 100% accurate, we should annotate things in hte direction of most value for the user. i.e. if most of the time nulls are accepted, the annotated that way. if most of the time it isn't, do the same. yes, that means there might be false-positives/negatives, but as long as it's the minority case that is better than being the majority case.

CyrusNajmabadi commented 5 years ago

We try to minimize them, but not at the expense of inaccuracy

Wait. When did this happen? NRT discussions were about hte idea of bringing high-value/likelihood warnings to users, esp. to prevent pointless fixes/fatigue. I don't remember this ever being stated as the mentality for NRT.

For example, with this approach, i don't see how we could ever mark teh return type of an interface as being not-null as we could never control every impl. Whereas, in practice, we likely would want to mark it this way if we strongly felt it was super unlikely that null woudl ever have been returned from any impl, and forcing people to have to always check/suppress would just be onerous in nearly all cases.

stephentoub commented 5 years ago

but correct in very unlikely cases

You and I are going to have to disagree on the unlikely-ness here I guess. When the docs for over a decade have said "nulls may not be accepted", when we have implementations that throw for null, when other 1st and 3rd party libraries throw for null, I believe we'd be doing a significant disservice to the feature and to the community by pretending otherwise.

CyrusNajmabadi commented 5 years ago

when we have implementations that throw for null,

What impls of EqualityComparer<T> throw for null?

jnm2 commented 5 years ago

VS and ReSharper push you towards implementing IEqualityComparer<>. I also want to know who has ever implemented EqualityComparer<>. (besides the BCL)

stephentoub commented 5 years ago

Are you drawing a distinction between EqualityComparer<T> and IEqualityComparer<T>?

The implementations I was referring to are IEqualityComparer<T>.

The docs for both are the same in this regard.

CyrusNajmabadi commented 5 years ago

Are you drawing a distinction between EqualityComparer and IEqualityComparer?

Yes. I mentioned that several times in the conversation :D But i can see how it could be missed.

The docs for both are the same in this regard.

I understand that. This i not about the docs for me, but about the actual behavior of this type (specifically EqualityComparer), and our ability to annotate it to match that behavior for the practical defacto behavior it has.

In other words, it's only unhelpful to annotate as per the docs when people have been actually reasonably coding to the defacto implementation. Since NRT is about helping me find real issues of concern, and since we know this behavior has been like this for like 15+ years, and isn't going to change, it's better to just have the annotation reflect reality so that the customer doesn't have to go working through NRT errors that are just going to be false-negatives for this.

--

Note: i'm find with IEqualityComparer being what it is. There are lots of impls of that that do throw null. As such, there's no clear winner between true-negative and false-negative for that case.

jnm2 commented 5 years ago

Yes, my initial request at the top of this thread mentions the distinction. I'm not asking for IEqualityComparer<>.GetHashCode to change but rather for EqualityComparer<>.GetHashCode to purposely diverge.

CyrusNajmabadi commented 5 years ago

I too think that the annotations for EqualityComparer<>.GetHashCode should purposefully diverge to match the defacto behavior that the ecosystem (and compilers/tools) have grown to depend on.

sharwell commented 5 years ago

I'm with @CyrusNajmabadi. In adopting NRT, it's better to alert the rare user who didn't handle null (and have them update their comparer) than to eliminate the general usability of EqualityComparer<T>.Default.

stephentoub commented 5 years ago

eliminate the general usability of EqualityComparer.Default

The language does not provide us the ability to special-case usage of GetHashCode when it is accessed off of EqualityComparer<T>.Default; if it did, I'd have no problem doing so.

CyrusNajmabadi commented 5 years ago

Why can't we just mark EqualityComparer<T>? Then it will be picked up through .Default. Again, to me it's going to be better here to have maybe an incredibly tiny amount of false-negatives by doing this rather than annotating as we have, and having a ton of false-positives.

stephentoub commented 5 years ago

Why can't we

We could. I think it's dangerous, though, and dangerous precedent that degrades the value and trustworthiness of the overall feature.

"Dear customer: Here are annotations that are meant to help you catch places where you're passing null to a method that may not allow it. Why is method A marked as accepting nulls when the docs explicitly say it may not? Ignore that; we didn't want to bother you with warnings, even though this whole feature set is about giving you warnings for places you may accidentally be passing nulls to things that may not accept them."

I realize you disagree with me.

CyrusNajmabadi commented 5 years ago

I realize you disagree with me.

:)

To me it's pretty easy to explain to the customer: these help describe hte actual behavior of APIs.

Right now, the feature seems broken to someone who has used any or our tooling or has just used the APIs since NRT is now telling them: this code is not ok, you need to fix it up, even thouh they don't. The more "false negatives/positives" one has, the less trustworthy it is.

Why is method A marked as accepting nulls when the docs explicitly say it may not? Ignore that;

You are already in this sitaution today. Customers could come today and say "why does EqualityComparer<T>.Default accept nulls, when the docs explicitly say it may not?". Why is the actual impl not an issue of trustworthyness, but the annotations now are? To me, i find it less trustworthy that i'm being told there's a problem in the majority of cases when there isn't a problem, versus not being told about a scant minority of cases**.

even though this whole feature set is about giving you warnings for places you may accidentally be passing nulls to things that may not accept them."

I disagree, and we were very explicit about that in the early days of NRT. That's why, for example, the language has adopted heuristics that may mask real bugs, but which are to prevent overall noise. For example, the assumption that multiple repeated expressions of the form a.b.c.d.e will have the same nullability results across all of them. Teh reason here is to strictly balance the need to be "correct" and to explicitly be at the "expense of inaccuracy" in order to not be overly pedantic with unhelpful warnings that will be wrong most of the time.

The feature at its core recognizes that its' about common coding patterns and about wanting to surface issues when they're likely to be a problem, not just possibly a problem.

Note: this is also the approach we took when doing NRT for TS. The documentation has always been about the expected behavior of the actual impls, not the doc'ed behavior when the impls have deviated. Similarly, we always went with the pramatic approach of optimizing for minimization of unhelpful warnings. If the majority expectation for a particular case is that the user will be fine, then we would opt to make the language (or API) not warn them for that particular pattern.

--

* Right now i'm even assuming there's a case where EqualityComparer<T> even throws null. My personal belief is that we really could live in a world where that has never happened and may never happen. But even if it does exist out tehre, i think it's so much less of a concern over the defacto impl that does* not throw that it's meaningless.

stephentoub commented 5 years ago

Like I said, I realize you disagree with me. I don't believe either of us is going to be convinced by the other's arguments; we've both heard them and internalized them before. We're now just repeating ourselves, both from this and previous conversations.