dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.74k forks source link

HashSet<int>.CreateSetComparer implementation is not symmetric #69218

Open jaredpar opened 2 years ago

jaredpar commented 2 years ago

Description

The implementation of HashSetEqualityComparer<T>.Equals does not consider the Count of collections when they have different comparers. That means it's possible for two sets to compare equals when they have a different count of items. That leads to the implementation not being symmetric.

Reproduction Steps

var x = new HashSet<int>(new Comparer());
x.Add(1);
var y = new HashSet<int>(new Comparer());
y.Add(1);
y.Add(2);

var comparer = HashSet<int>.CreateSetComparer();

Console.WriteLine(comparer.Equals(x, y));
Console.WriteLine(comparer.Equals(y, x));

internal sealed class Comparer : IEqualityComparer<int>
{
    public bool Equals(int x, int y) => x == y;
    public int GetHashCode(int obj) => obj.GetHashCode();
}

Sharplab demonstration

Expected behavior

The expected behavior is to print

False
False

Actual behavior

This code prints:

False
True

Regression?

No

Known Workarounds

No response

Configuration

net6.0

Other information

No response

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.

Issue Details
### Description The implementation of `HashSetEqualityComparer.Equals` does not consider the `Count` of collections when they have different comparers. That means it's possible for two sets to compare equals when they have a different count of items. In turn ### Reproduction Steps ```c# var x = new HashSet(new Comparer()); x.Add(1); var y = new HashSet(new Comparer()); y.Add(1); y.Add(2); var comparer = HashSet.CreateSetComparer(); Console.WriteLine(comparer.Equals(x, y)); Console.WriteLine(comparer.Equals(y, x)); internal sealed class Comparer : IEqualityComparer { public bool Equals(int x, int y) => x == y; public int GetHashCode(int obj) => obj.GetHashCode(); } ``` [Sharplab demonstration](https://sharplab.io/#v2:C4LgTgrgdgNAJiA1AHwLACgACAmAjBzABgAJNcAWAbgwwDcBDMYgD2IF5ioBTAd2IAl6AZwAWAZS7AAPAEsowAHwAKbnwDCAewC2AB0ZcwSgJRHq6ZgDoAgnDhLcpuo2IBPdp14Dh4ybPnLVYk1dfUMTMxdrW3tHdEibO2xYpyYAY209MAN3QVEJaTlFCzUs+mAufODMg2MzAlwATiV0kKywCwBRAEcIegAbISVmGFdw+qaW6vbu3oGlFxHmMfQMQoMofuIhLn6uOFJsIIzQ4hBiAEkZ/plgFyrQv0UMAG8MYnfSAGZiACMNDT6xCuc0KLBGoJcRnYChY7A4LjMHy+xFBAHFJLkRJo4FwlKCND8AFZQtgwgmEizo4CY7G42IAXwwQA==) ### Expected behavior The expected behavior is to print ```txt False False ``` ### Actual behavior This code prints: ```txt False True ``` ### Regression? No ### Known Workarounds _No response_ ### Configuration `net6.0` ### Other information _No response_
Author: jaredpar
Assignees: -
Labels: `area-System.Collections`
Milestone: -
stephentoub commented 2 years ago

Certainly seems like a bug. The semantics of this comparer are suspect to begin with (it's always using EqualityComparer<T>.Default), but if the goal is to determine whether a set created from one using the default comparer is equal to a set created from the other using the default comparer, then it makes sense that this path doesn't just check the count of each collection; it's feasible you could have two collections that produce the same set using the default comparer but that have different counts, e.g.

using System.Diagnostics.CodeAnalysis;

var set1 = new HashSet<Person>()
{
    new Person("Some", "One"),
};

var set2 = new HashSet<Person>(new BothNamesComparer())
{
    new Person("Some", "One"),
    new Person("Some", "Other"),
};

Console.WriteLine(set1.Count);
Console.WriteLine(set2.Count);
Console.WriteLine(new HashSet<Person>(set1).Count);
Console.WriteLine(new HashSet<Person>(set2).Count);

class Person
{
    public string FirstName, LastName;
    public Person(string firstName, string lastName)
    {
        FirstName = firstName;
        LastName = lastName;
    }
    public override bool Equals(object? obj) => obj is Person p && p.FirstName == FirstName;
    public override int GetHashCode() => FirstName.GetHashCode();
}

class BothNamesComparer : IEqualityComparer<Person>
{
    public bool Equals(Person? x, Person? y) => x?.FirstName == y?.FirstName && x?.LastName == y?.LastName;
    public int GetHashCode([DisallowNull] Person obj) => HashCode.Combine(obj.FirstName, obj.LastName);
}

But the way it compensates for that is, as you say, asymmetrical. It's iterating through one set validating that each item exists (under the default comparer) in the other set, but it's not doing the inverse, which means the second set could contain additional items not in the first set.

eiriktsarpalis commented 2 years ago

Related to #18931. There were attempts made in the past to improve the implementation but the risk of breaking folks depending on the current behavior probably outweighs any potential benefits.

jaredpar commented 2 years ago

To be clear: I don't have a pressing need for this to be fixed. It came up as a part of a code review and the behavior just seemed off to me and wanted to make sure it was tracked.

My 2 cents: if the risk of breaking change is high then I'd agree with the sentiment around obsoleting the behavior.