Open oliverzick opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.
Author: | oliverzick |
---|---|
Assignees: | - |
Labels: | `area-System.Collections` |
Milestone: | - |
The
ImmutableArray<T>
struct implements theIEquatable<T>
interface but does not stick to the rules of implementing value semantics acoording to the IEquatable interface.
IEquatable<T>
doesn't require types implement deep value semantics. It implies no contract other than some loosely defined "equality" that's up for interpretation by types that implement it. It's not completely uncommon for either shallow equality or ignored properties to be common in implementations in C# or other languages.
And even if we wanted to implement structural equality at this point, it would most likely be a breaking change.
I think the typical guidance here is to use SequenceEqual, but the behavior can be a little troubling if you want to deeply compare records which contain collections, for example. (Records use EqualityComparer
using System;
using System.Collections.Generic;
using System.Linq;
var r1 = new R1(new List<R2> { new R2(new List<int> { 1 }) });
var r2 = new R1(new List<R2> { new R2(new List<int> { 1 }) });
Console.WriteLine(r1.Equals(r2)); // False
Console.WriteLine(Enumerable.SequenceEqual(r1.Items, r2.Items)); // False
Console.WriteLine(r1.Equals(r2)); // False
Console.WriteLine(Enumerable.SequenceEqual(r1.Items[0].NestedItems, r2.Items[0].NestedItems)); // True
record R1(List<R2> Items);
record R2(List<int> NestedItems);
That said, I'm not sure the problem is really "fixable", except that if you want these kinds of equality semantics, to take care to use SequenceEqual or equivalent methods when comparing collections, and perhaps do the same when you implement IEquatable.Equals.
(Partially unrelated) Just wanted to say that while I agree that of course changing the behavior of IEquatable<T>
here is definitely a breaking change and not doable, the current behavior is indeed extremely unfortunate when using lots of records like @RikkiGibson mentioned. This is extremely common when writing source generators (as incrementa generators require you to define models for your incremental steps), and the current situation is that the second you have even a single record parameter that's an immutable array, you just fall off a cliff and have to go all the way down to implementing equality manually for the whole record type, which is a pain. It would be nice if there was some way to improve this scenario, as it's arguably pretty common and unfortunately (1) extremely verbose and (2) extremely error prone, as if you don't know this behavior or just forget when refactoring, you'll silently break your entire incremental pipeline after that step π
@Sergio0694 Perhaps exposing an EqualityComparer combinator could help in that regard?
public static class EqualityComparer
{
public static IEqualityComparer<TList> CreateListComparer<TList, TElement>(IEqualityComparer<TElement> elementComparer)
where TList : IList<TElement> { }
}
Related to https://github.com/dotnet/runtime/issues/33873 and https://github.com/dotnet/runtime/issues/19644
As @RikkiGibson and @Sergio0694 already mentioned the current equatable behavior of the ImmutableArray<T>
struct - whether this was intended or not - is a pitfall, unfortunately.
At least I have experienced this on heavily utilizing records (e.g. when working with DDD or object-functional techniques that involve a lot of immutability and types that come with value semantics). So I was introducing new records representing composites that own their contained collections. Then I was looking for some out-of-the-box immutable collection and the ImmutableArray<T>
was the only one I could find which is implementing the IEquatable<T>
interface.
After replacing a custom immutable collection type that comes with value semantics the corresponding specs and tests were broken. So I digged a bit deeper and detected that the ImmutableArray<T>
was actually not what I thought it is, according to its name and implementation of the IEquatable<T>
interface. It looks like @Sergio0694 came across a similar situation. :sweat_smile:
@eiriktsarpalis I know that changing this behavior afterwards would be a breaking change. In my opinion it would be great if there was an out-of-the-box immutable collection or bare enumerable implementation that comes with value semantics and can be used within records.
"it would be great if there was an out-of-the-box immutable collection or bare enumerable implementation that comes with value semantics and can be used within records."
You know what, that's... Actually a very good idea. I reckon using something like that might very well make authoring records much much simpler, as there would be no more a need to handroll custom comparers every single time π€
I've hacked together really quick just to play around:
See updated code here.
And you'd use it like so:
record MyRecord(EquatableArray<string> Names, EquatableArray<int> Numbers);
ImmutableArray<string> names = GetNames();
ImmutableArray<int> numbers = GetNumbers();
MyRecord myRecord = new(names, numbers); // Implicit conversion
names = myRecord.Names;
numbers = myRecord.Numbers; // Also implicit conversion
I might actually try this out in a branch of the MVVM Toolkit and see how this goes π It would seem like this would simplify a ton of code!
UPDATE: well this was an absolute success, this is amazing π
@oliverzick @Sergio0694 I had that problem as well. I wrote myself a struct wrapper that confers value semantics to any list. Something like:
struct ValueList<T> : IList<T>
{
IList<T> inner;
IEqualityComparer<T> elementComparer;
//...
}
static class ValueList
{
public static ValueList<T> Create<T>(IList<T> inner) => ...
}
var bytes = ValueList.Create(myByteArray);
That type can ease a lot of pain, especially in the context of LINQ. The main drawback is that the ValueList
type leaks into the public surface area of APIs.
Yup this is not something I'd recommend for public APIs. In my scenario though I was just referring to models in incremental generator steps, which are all internal anyway, so that's not an issue. @eiriktsarpalis that experiment was really a success both in the MVVM Toolkit and ComputeSharp, and I've also discovered yesterday that @jkoritzinsky has also been using an approach much like this in all the various COM interop generators as well. I wonder if this same idea could also help you simplify things in the System.Text.Json generators? Just really feels like a much simpler approach for incremental models π
@Sergio0694 It's good to see that a discussion like this one creates new ideas. And even what you have posted looks great. Beside this in our projects we mostly require only a bare immutable enumerable to use e.g. in composites by hiding the implementation detail and only want to have an fully equatable and immutable enumerable, something like:
public sealed class ImmutableEnumerable<T> : IEnumerable<T>, IEquatable<ImmutableEnumerable<T>>
{
// This type is owner of the contained collection; no one can change this from outside.
// Use normal array for performance and memory footprint reasons; could also be an ImmutableArray<T> to prevent changing items by mistake once it has been created.
private readonly T[] items;
private ImmutableEnumerable(T[] items)
{
this.items = items;
}
public static ImmutableEnumerable<T> Create(IEnumerable<T> items)
=> new(items.ToArray());
// ToDo: Provide implicit type conversion, if required
public static bool operator ==(ImmutableEnumerable<T> left, ImmutableEnumerable<T> right)
=> Equals(left, right);
public static bool operator !=(ImmutableEnumerable<T> left, ImmutableEnumerable<T> right)
=> !(left == right);
public override int GetHashCode()
{
var hashCode = new HashCode();
foreach (var item in this.items)
{
hashCode.Add(item);
}
return hashCode.ToHashCode();
}
public override bool Equals(object? obj)
=> this.Equals(obj as ImmutableEnumerable<T>);
public bool Equals(ImmutableEnumerable<T>? other)
{
if (ReferenceEquals(null, other))
{
return false;
}
if (ReferenceEquals(this, other))
{
return true;
}
return this.items.SequenceEqual(other.items);
}
public IEnumerator<T> GetEnumerator()
=> this.items.AsEnumerable().GetEnumerator();
IEnumerator IEnumerable.GetEnumerator()
=> this.GetEnumerator();
}
We are not exposing the owned collection which in turn prevents access from outside. In addition anyone can create new collection instances by using corresponding ToXxx
methods - but I'm telling not something new. π
And even there is no restriction on the generic type parameter that enforces to implement IEquatable<T>
interface, this is completely up to the used type what kind of equality it comes with.
I came here to report the same issue.
Without deep comparison, the automatic Equals
implemented by record
classes doesn't tell us two records are equal, it only tells us if they are equal and they share a common ancestor that created the Immutable*
collection, and that's not really very useful.
The same problem exists for all the Immutable*
types.
Just wanted to mention that this issue keeps popping up very often whenever I see people writing source generators (myself included). I had to copy-paste my EquatableArray<T>
type in over 4 different projects for the same reason, and it's just not ideal. This feels like a common enough issue for source generator authors that it'd be nice if this could either be addressed in ImmutableArray<T>
itself, or maybe we should propose a separate API that could be used in these cases? I know some other generator authors (eg. @jkoritzinsky) have alternative solutions to this, but it just seels like everyone is reinventing ways to solve the same issue, so maybe just having a proper solution available for everyone would be better? π€
Ran into this today, but in my case the lack of structural equality by default didn't just break the caching in the incremental pipeline. My generation function is doing a GroupBy
on an element of the model that contains an ImmutableArray<ISymbol>
, which results in the generator producing incorrect output. Fortunately my tests caught it, but this is definitely a pit of failure for generators (one of many, unfortunately).
How about special-casing ImmutableArray in the default equality implementation of records? It's not ideal, but just throwing in the idea.
"My generation function is doing a
GroupBy
on an element of the model that contains anImmutableArray<ISymbol>
, which results in the generator producing incorrect output."
@eatdrinksleepcode Just read that message β this is a side point to the rest of this conversation, but definitely do not capture symbols in your incremental models, ever. It doesn't matter whether you use a custom comparer or whatever to fix incrementality, symbols should just never flow across incremental steps because they cause compilations to be rooted across incremental runs.
@Sergio0694 aren't they just data structures?
What is the alternative?
@mrpmorris Create your own type(s) that capture the symbol info you want and copy those over.
I find that this article captures good guidance on how to write good incremental source generators.
@Joe4evr that's what I hoped you'd say :)
Yup this is not something I'd recommend for public APIs.
why?
FWIW I ended up defining a bunch of equatable collection types for my own pet project source generators:
https://github.com/eiriktsarpalis/typeshape-csharp/tree/main/src/TypeShape.Roslyn/IncrementalTypes
+1 to fixing this. Requiring 99% of uses of source generators to use a custom type to work at all is a terrible user experience.
None of the built-in collection types do equality across the entire sequence of elements for IEquatable<T>
and it has never been part of the contract of IEquatable<T>
to behave as such. IEquatable<T>
in no way requires a specific behavior for value types, reference types, etc. The only thing IEquatable<T>
requires is in relation to Equals(object obj)
and GetHashCode()
where it requires some them to behave consistently across the 3 to ensure that hash sets and other collection types work as expected.
It is an explicit design decision for them to be this way, going back to .NET Framework 1.0. It is non-desirable, potentially very expensive, and may even have security concerns/implications to do element-wise equality by default. -- Security concerns can come in from potentially unbounded or unknown processing time which can lead to things like DDoS opportunities or the like. It can lead to catastrophic performance when used with other collections, dictionaries, etc.
Operations like comparing or formatting collections should do the simple thing by default, because these operations can be expected to be frequently called. You should then have separate APIs, such as SequenceEquals
, when you intentionally want to do element-wise operations between collections.
This is highly unlikely to be changed, but a different API/interface might be possible if there is sufficient justification.
I am interested in the possibility of 'IStructuralEquatable'. I would want for records to be able to easily use that interface for equality checks on fields, and perhaps to add automatic implementation of the interface on record types as well.
However, I also feel like I have seen relatively few requests/upvotes to address the "half-structural" equality that records have in practice. So there's a lot of other features I would want to do first.
I thought c# had gotten inherent deep value-equality through records + immutable-collections. In fact a custom-collection is required? After reading this thread I see why. Still, I wish for some built-in way to complete the "half-structural" equality of records.
Background: While writing equals-assertion in a test, instead of using the assertion-library's slow, 'magic' value-comparison I wanted to utilise fast, well-defined behaviour in c# records. (Better to require c# knowledge.)
To get 'free/no-code' deep comparisons the plan was to use records all the way down. At first mindlessly stumbled with collection-properties that were regular arrays. Then took the immutability and value-equality of records as a queue and tried immutable-array. Was surprised to discover it only did reference-equals.
Then reasoned:
No collection with value-comparison exist -> Have to create custom collection to override Equals
.
Don't want to wrap immutable-array to keep no-allocations-benefit -> Must extend it -> Oh no, it is sealed! -> came here.
Surprised at this. Has anyone created a package for an equatable collection?
Has anyone created a package for an equatable collection?
I've published this package that includes a number of immutable equatable collections that I use in my source generators. Variations of the same types are also used by generators in this repo.
Why not introduce a warning when an ImmutableArray
is cast to IEquatable<'T>
?
There seems to be consensus that:
IEquatable<'T>
suggests to many users that equality is structural and will therefore cause bugs if users make this assumption.A warning would remove the pitfall without any breaking changes. Even if it is off by default, it would allow users to remove the danger of using this interface by mistake.
structural equality
is not the same as content equality
or sequence equality
or whatever other terminology you want to define. That is, structural equality
does not mean that it deeply compares all members using structural equality
.
Rather, reference equality
is the default for reference types
and simply means that the reference is compared, not the fields of the type. While structural equality
is the default for value types
and simply means that Equals
is called on each field within the type. -- Put in other words, structural equality has always defaulted to being shallow, not deep. You can have reference types (like records) which do structural equality by default, but that is not the norm and there are potential downsides and even security considerations (such as potential for DDoS) when comparing every element in a sequence.
Thus, while comparing two T[]
uses reference equality by default and comparing two struct S { T[] field; }
uses structural equality
by default, the latter is going to functionally behave the same as the former because the struct only contains a single field that is itself a reference type. This is how .NET has always worked going back to v1.0 over 20 years ago.
This is likewise exactly how types like Span<T>
work, how some theoretical ValueList<T>
would be expected to work, etc. We expose a separate set of APIs such as SequenceEquals
where relevant that are appropriate for doing the type of deeper equality on collection types that's being asked about in this thread. -- It might be viable to request some new ISequenceEquatable
interface that allows access to such a SequenceEquals
member so that these types of comparisons can be done efficiently (which is a bit more direct and explicitly than simply checking if it implements IEnumerable
, which may miss some things and may not be valid for all collection types).
There are cases where reference equality is useful but these are extremely small in domain modelling. It's a 0.01% case among the code that I am familiar with (which is biased to be predominantly F#, where modelling of domains is prominent). Your "not the norm" includes records and tuples but also strings. Add structural equality on lists, DUs, and record types and you have the F# world in which no one needs to bother with reference equality. The shallow vs deep issue and the content/structural/sequence distinctions go away assuming everything inside has structural equality.
Having equality (semantics) depend on whether something is a struct or a reference type (implementation details) is a horrible idea and something dotnet1.0 clearly got wrong, and something that the generic IEquatable<'T>
interface (which dotnet1.0 didn't have) can help to fix. Clearly C# users are also taking IEquatable<'T>
to mean semantic equality (i.e. some type of structural) and that is why these users post here that this a pitfall.
For F# and domain modelling the IEquatable<'T>
interface has the potential to make equality efficient but preserving structural semantics. So it's important not to have these pitfalls.
Such equality is not a viable default in many (if not most) domains due to the potential for security issues, such as DDoS style attacks. Itβs something that already exists has to be explicitly considered and specially handled for strings and which causes numerous potential for issues to otherwise unassuming users.
you ultimately want to support multiple types of equality and need them to be differentiated by some form of contract or interface.
IEquatable is, by design, intended for allowing use of types with collections and sets. This in part also implies having bounded defaults so that otherwise basic operations do not present pita of failure by default.
Sequence equality is a different concept that may warrant its own separate interface
Description
The interface.
ImmutableArray<T>
struct implements theIEquatable<T>
interface but does not stick to the rules of implementing value semantics acoording to the IEquatableWhen comparing two instance of a
ImmutableArray<T>
both instance are considered equal only when both instances refer to the same contained array instance. Instead both instances should be considered equal when the contents of the contained collections are equal.Reproduction Steps
Expected behavior
Actual behavior
Already described in Description and Reproduction Steps sections.
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
The
IEquatable<T>
interface should by properly implemented according to its specification to ensure value semantics ofImmutableArray<T>
. This also includes desired behavior of Equals and GetHashCode methods according to interface specification.