dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.72k forks source link

Add "Create" method to EqualityComparer<> class #24460

Closed jkotas closed 2 years ago

jkotas commented 6 years ago

EDITED by @stephentoub on 4/26/2022 to update proposal:

namespace System.Collections.Generic
{
    public abstract class EqualityComparer<T> : IEqualityComparer<T>, IEqualityComparer
    {
+        public static EqualityComparer<T> Create(Func<T?, T?, bool> equals, Func<T, int> getHashCode);
    }
}

To discuss:


From @ViIvanov on December 14, 2017 20:15

Now we have a Create method in Comparer<> class:

public static Comparer<T> Create(Comparison<T> comparison)

I think, it will be usable to have an analog in EqualityComparer<>:

public static EqualityComparer<T> Create(Func<T, T, bool> equals, Func<T, int> getHashCode)

Copied from original issue: dotnet/coreclr#15526

JonHanna commented 6 years ago

Not quite as handy since we need two delegates, and there isn't as much of an existing use as there is with Comparison, but I could definitely see this as something I'd use while experimenting at least, even if I might end up replacing with a custom equality comparer.

karelz commented 6 years ago

What is the value? Example of usage today vs. in future? How commonly is it used? Ideally please follow formal API proposal (see the example there).

From a brief look it seems today's state - inheritance + overriding 2 methods is not too much extra code compared to the API proposed above. The case of Comparer is more needed IMO to avoid adding another class with boilerplate.

weitzhandler commented 6 years ago

I also find that very useful. It's annoying when you always have to implement an interface to be able to compare two property values.

Though I'd prefer the method to be in a shorter version:

public static IEqualityComparer<T> Create(params Expression<Func<T, object>>[] properties);

Which will compare the two objects if the specified properties are equal, and will generate hash-codes based on the provided lambda properties.

Usage:

var eqComparer1 = EqualityComparer<Contact>.Create(c => c.FullName);
var eqComparer2 = EqualityComparer<Contact>.Create(c => c.FirstName, c => c.LastName);

Here's some untested prototype:

public class EqualityComparerImpl<T> : IEqualityComparer<T>
{
  public static EqualityComparerImpl<T> Create(
    params Expression<Func<T, object>>[] properties) =>
    new EqualityComparerImpl<T>(properties);

  PropertyInfo[] _properties;
  EqualityComparerImpl(Expression<Func<T, object>>[] properties)
  {
    if (properties == null)
      throw new ArgumentNullException(nameof(properties));

    if (properties.Length == 0)
      throw new ArgumentOutOfRangeException(nameof(properties));

    var length = properties.Length;
    var extractions = new PropertyInfo[length];
    for (int i = 0; i < length; i++)
    {
      var property = properties[i];
      extractions[i] = ExtractProperty(property);
    }
    _properties = extractions;
  }

  public bool Equals(T x, T y)
  {
    if (ReferenceEquals(x, y))
      //covers both are null
      return true;
    if (x == null || y == null)
      return false;
    var len = _properties.Length;
    for (int i = 0; i < _properties.Length; i++)
    {
      var property = _properties[i];
      if (!Equals(property.GetValue(x), property.GetValue(y)))
        return false;
    }
    return true;
  }

  public int GetHashCode(T obj)
  {
    if (obj == null)
      return 0;

    var hashes = _properties.Select(pi => pi.GetValue(obj)?.GetHashCode() ?? 0).ToArray();
    return Combine(hashes);
  }

  static int Combine(int[] hashes)
  {
    int result = 0;
    foreach (var hash in hashes)
    {
      uint rol5 = ((uint)result << 5) | ((uint)result >> 27);
      result = ((int)rol5 + result) ^ hash;
    }
    return result;
  }

  static PropertyInfo ExtractProperty(Expression<Func<T, object>> property)
  {
    if (property.NodeType != ExpressionType.Lambda)
      throwEx();

    var body = property.Body;
    if (body.NodeType == ExpressionType.Convert)
      if (body is UnaryExpression unary)
        body = unary.Operand;
      else
        throwEx();

    if (!(body is MemberExpression member))
      throwEx();

    if (!(member.Member is PropertyInfo pi))
      throwEx();

    return pi;

    void throwEx() =>
      throw new NotSupportedException($"The expression '{property}' isn't supported.");
  }
}

And BTW, there IS demand for this.

peterduniho commented 6 years ago

Is it hard to write a class implementing IEqualityComparer<T>? Nope. But it's tedious, and wordy.

That's why in every private "util" library I use, I eventually wind up with a delegate-based implementation of IEqualityComparer<T> that has a Create() method to accept the appropriate lambda expression delegates for the Equals() and GetHashCode() implementations. I get to write (copy/paste) that class in once, and then when I need an equality comparer, it's a one-liner in the actual code.

And to be clear: writing a new IEqualityComparer<T> for each use case in a non-starter. It's way too much extra code, and it hides the implementation details away in some other class, when what you really want is to be able to see those right where the code is used.

Does it come up a ton? Again, nope. I may need this maybe two or three times a year, tops. But then, so what? It's such a small thing to add, and yet a glaring omission, from the otherwise very helpful and convenient .NET API.

Is there a place to vote for this? As you can see, my vote would be a solid "yes".

karelz commented 6 years ago

@peterduniho upvote top post. It is sortable on GitHub by number of votes there ...

peterduniho commented 6 years ago

@karelz thanks. I don't see an "upvote" option per se, but I assume you mean the "thumbs up reaction" option. "Vote" submitted. :)

stimpy77 commented 6 years ago

Can't even write a .Distinct(..) without also writing up a whole new IEqualityComparer class implementation, just for one LINQ clause, this is ridiculous.

Maybe Distinct(..) itself should tuck this away with its own overload(s) with a single lambda or a growable (params style) set of lambdas listing all the properties to find distinction on for just this one filter statement. The output of this being a generated implementation of IEqualityComparer. In which case I suppose it would be handled by a different team / group with a different github work item.

terrajobst commented 4 years ago

Video

Can someone update the sample to make a compelling case for the currently proposed API? The sample code was for a different proposal.

If Distinct is the only use case, then maybe we should fix Distinct itself.

peterduniho commented 4 years ago

Can someone update the sample to make a compelling case for the currently proposed API? The sample code was for a different proposal. If Distinct is the only use case, then maybe we should fix Distinct itself.

I'm not sure what sample you mean, so I'm probably missing part of the point of the request. My apologies about that.. But this isn't just an issue with Distinct(). Another common scenario would be various dictionary types (e.g. Dictionary<TKey, TValue> and ConcurrentDictionary<TKey, TValue>). Pretty much any place you'd want an instance of IEqualityComparer<T>, it would be much nicer to be able to quickly define one using lambdas, rather than having to implement a whole new class every time.

The fact that utility libriaries used internally in numerous projects have defined this functionality over and over strongly suggests it'd be a useful addition to the API (for what it's worth, this has come up in pretty much every non-trivial project I've been involved with over the last ten years or so).

Yes, it's true that this could be addressed in other ways. For example, just forget about it and let everyone keep doing what they've been doing (i.e. create the helper methods themselves). Or each of the places in .NET where IEqualityComparer<T> can be used can add an overload where instead of passing IEqualityComparer<T>, one can pass the necessary lambdas to define the comparer. But putting this in a single centralized place where it can be used for any place one needs IEqualityComparer<T> seems to me to be a lot more useful approach.

ericstj commented 4 years ago

Given where we are at in 5.0.0 I don't this this will make it. I do support the scenario, I've come across this before and have felt it jarring that I need to stop coding and create a new type, just when I want to tweak the comparison behavior of a collection.

daviburg commented 3 years ago

Needed that today to avoid repeatedly writing classes for Distinct in LINQ


    using System;
    using System.Collections.Generic;
    using System.Diagnostics.CodeAnalysis;

    /// <summary>
    /// Inline capable generic equality comparer
    /// </summary>
    /// <typeparam name="T">The type to compare for equality</typeparam>
    public class InlineableEqualityComparer<T> : EqualityComparer<T>
    {
        /// <summary>
        /// Type specific equality function
        /// </summary>
        private readonly Func<T, T, bool> equalityFunction;

        /// <summary>
        /// Type specific hash function
        /// </summary>
        private readonly Func<T, int> hashFunction;

        /// <summary>
        /// Prevents the creation of default instance, as it would be invalid
        /// </summary>
        private InlineableEqualityComparer()
        {
        }

        /// <summary>
        /// Creates a new instance of an equality comparer using the provided functions
        /// </summary>
        /// <param name="equalityFunction">The equality function</param>
        /// <param name="hashFunction">The hash function</param>
        public InlineableEqualityComparer(Func<T, T, bool> equalityFunction, Func<T, int> hashFunction)
        {
            this.equalityFunction = equalityFunction ?? throw new ArgumentNullException(nameof(equalityFunction));
            this.hashFunction = hashFunction ?? throw new ArgumentNullException(nameof(hashFunction));
        }

        /// <inheritdoc/>
        public override bool Equals(T x, T y) => this.equalityFunction(x, y);

        /// <inheritdoc/>
        public override int GetHashCode(T obj) => this.hashFunction(obj);

        /// <summary>
        /// Creates an equality comparer using the provided functions
        /// </summary>
        /// <param name="equalityFunction">The equality function</param>
        /// <param name="hashFunction">The hash function</param>
        /// <returns>The new comparer</returns>
        [SuppressMessage("Design", "CA1000:Do not declare static members on generic types", Justification = "Same pattern as .NET's Comparer<T>.Create(Comparison<T>) Method")]
        public static IEqualityComparer<T> Create(Func<T, T, bool> equalityFunction, Func<T, int> hashFunction) =>
            new InlineableEqualityComparer<T>(equalityFunction, hashFunction);
    }

Usage (with dummy lambdas for the sake of argument): .Distinct(InlineableEqualityComparer<MetadataRetrievalNode>.Create((x, y) => true, x => 0))

eiriktsarpalis commented 2 years ago

Came across the need for such a method today while working with Roslyn incremental values. I suspect there might be value in making the getHashCode delegate optional, which would help accelerate scenaria like the following

nodes1.SequenceEqual(nodes2, EqualityComparer<SyntaxNode>.Create((n1, n2) => n1.IsEquivalentTo(n2)));

Calling GetHashCode on a comparer that doesn't specify a getHashCode delegate would throw NotSupportedException. It is fairly common to construct comparers that only require the Equals method.

@tannergooding @terrajobst thoughts? It's fairly pervasive pattern that should eliminate the need for boilerplate classes for most APIs that accept IEqualityComparer<T>. If you agree I can update the OP and mark it ready for review.

ViIvanov commented 2 years ago

Not sure, that NotSupportedException is a part of equality comparer (IEqualityComparer<> or EqualityComparer<>) contract 🤔 Someone can expect NotimplementedException or even some ArgumentException in that case.

Maybe, for your cases it is better to use explicit delegate?

nodes1.SequenceEqual(nodes2, EqualityComparer<SyntaxNode>.Create((n1, n2) => n1.IsEquivalentTo(n2), _ => throw new NotSupportedException()));
bartonjs commented 2 years ago

Video

Looks good as proposed

namespace System.Collections.Generic
{
    public partial class EqualityComparer<T> : IEqualityComparer<T>, IEqualityComparer
    {
        public static EqualityComparer<T> Create(Func<T?, T?, bool> equals, Func<T, int> getHashCode);
    }
}
stephentoub commented 2 years ago

@bartonjs, in implementing this and using it in various places, I think we should make getHashCode optional. It's common to only need Equals and not need GetHashCode (e.g. passing a comparer to SequenceEquals), and I expect a significant number of the consumers for this API will be tests, in which case it's actually beneficial for GetHashCode to throw if it's not supplied in order to validate GetHashCode isn't being used in situations you don't expect it to be used. Since in the discussion yesterday the room was largely ambivalent about whether to require getHashCode or not and we effectively flipped a coin, I'm going to proceed under the assumption we're ok with it.

public static EqualityComparer<T> Create(Func<T?, T?, bool> equals, Func<T, int>? getHashCode = null);

The only real downside is if GetHashCode was actually needed for a use and someone neglected to supply it because it's optional, but in such cases, they'll likely get an exception almost immediately.

bartonjs commented 2 years ago

I'm not sure that I agree that "null-accepting" should immediately jump to "null-defaulting", but I guess I agree with the logic that the exception is fast-coming.

But, yeah, if anyone had said "oh, it's super common in writing tests to not specify an implementation for GetHashCode" then we'd've slapped on the question mark.