dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.29k stars 4.74k forks source link

[System.Linq] Consider adding runtime checks for IReadOnlyCollection<T> in input sources #42254

Open cmeyertons opened 4 years ago

cmeyertons commented 4 years ago

There are many places in the Linq / Collection code that leverage detecting if an IEnumerable<T> is an ICollection<T> to perform optimizations (e.g. presizing a new array, etc.)

List.cs

Because ICollection<T> implements IReadonlyCollection<T>, IReadonlyCollection<T> should be exclusively used in these scenarios to support custom IReadonlyCollection<T> implementations that don't necessary want to expose Add(T item)

Currently, collection authors have to implement ICollection to take advantage of the performance gains and leave Add throwing NotImplementedException to convey proper usage.

ghost commented 4 years ago

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley See info in area-owners.md if you want to be subscribed.

stephentoub commented 4 years ago

Because ICollection<T> implements IReadonlyCollection<T>

It doesn't.

EgorBo commented 4 years ago

If I recall correctly such suggestions to add (not to replace ICollection) fast paths for IReadOnlyCollection here and there were rejected several times because such casts to covariant interfaces were super slow, however these performance issues were fixed as far as I know (cast caches, inlined checks) so maybe it worth checking if we can add them in some places?

danmoseley commented 4 years ago

@davidwrighton are these kinds of "optimistic checks for interfaces" indeed much cheaper now than in the past?

EgorBo commented 4 years ago

cc @VSadov

cmeyertons commented 4 years ago

Because ICollection<T> implements IReadonlyCollection<T>

It doesn't.

As I was! Egg on my face for sure. Apologies, i thought this would be a drop-in request. Thanks for the quick replies

EgorBo commented 4 years ago

@danmosemsft a quick benchmark:

static IEnumerable<string> strings = new List<string>();

[Benchmark]
public bool IsCollection() =>   strings is ICollection<string>;

[Benchmark]
public bool IsReadOnlyList() => strings is IReadOnlyCollection<string>;

.NET Core 2.2:

|          Method |      Mean |     Error |    StdDev |
|---------------- |----------:|----------:|----------:|
|   IsCollection  |  2.637 ns | 0.0038 ns | 0.0035 ns |
| IsReadOnlyList  | 41.492 ns | 0.0911 ns | 0.0808 ns |

.NET Core 3.0:

|          Method |      Mean |     Error |    StdDev |
|---------------- |----------:|----------:|----------:|
|   IsCollection  |  1.069 ns | 0.0008 ns | 0.0007 ns |
| IsReadOnlyList  | 40.578 ns | 0.0316 ns | 0.0264 ns |

.NET 5.0:

|          Method |      Mean |     Error |    StdDev |
|---------------- |----------:|----------:|----------:|
|   IsCollection  |  1.121 ns | 0.0010 ns | 0.0009 ns |
| IsReadOnlyList  |  2.976 ns | 0.0094 ns | 0.0088 ns |

Related PR: https://github.com/dotnet/coreclr/pull/23548

VSadov commented 4 years ago

Covariant interfaces are not super slow now.

Cost can vary for both regular interface casts and for fancy ones. Regular interface cast is a linear search, but typically does not need to search far. Cached cast may need to deal with hash collisions, but typically just gets a cached value.

As a veeery rough estimate a fancy cast can be counted as a 2X of a regular interface cast.

In the past the cost of complicated casts was technically unbounded. As you nest variant generics, the cost would go up and considerably. Thus they were avoided by library owners.

danmoseley commented 4 years ago

Thanks @EgorBo that is indeed much faster.

davidwrighton commented 4 years ago

It is definitely faster than before, but there is still a non-zero cost to performance when making the suggested change.

I have a few possible concerns here.

  1. What about customers that only implement ICollection<T> and not IReadOnlyCollection<T>?
  2. If we mitigate concern #1 by having checks for both ICollection<T> and IReadOnlyCollection<T> how much of a penalty does making the LINQ functions larger have?
  3. Do we have any concerns around customers who may have implemented IReadOnlyCollection<T> in such a way that it does not match with the behavior of IEnumerable<T>? My guess is that we would not treat such scenarios specially, but it is a real possibility that customers with custom written collections may have incorrect implementations of code that hasn't been tested.
  4. As @VSadov notes, the performance impact is now much less severe, but its not nothing.
EgorBo commented 4 years ago

It is definitely faster than before, but there is still a non-zero cost to performance when making the suggested change.

I have a few possible concerns here.

  1. What about customers that only implement ICollection<T> and not IReadOnlyCollection<T>?
  2. If we mitigate concern #1 by having checks for both ICollection<T> and IReadOnlyCollection<T> how much of a penalty does making the LINQ functions larger have?
  3. Do we have any concerns around customers who may have implemented IReadOnlyCollection<T> in such a way that it does not match with the behavior of IEnumerable<T>? My guess is that we would not treat such scenarios specially, but it is a real possibility that customers with custom written collections may have incorrect implementations of code that hasn't been tested.
  4. As @VSadov notes, the performance impact is now much less severe, but its not nothing.

A good example is Linq's Count: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Linq/src/System/Linq/Count.cs#L11-L46

IReadOnlyCollection<T> check used to apply a penalty for the O(N) foreach-based fallback (note other fast paths), but now that penalty ~15 times smaller. But of course it depends on how often users have IROC there - I am not in charge to answer 🙂 Currently Count for IReadOnlyCollection<T> input leads to O(N) loop which can be quite slow for large collections.

AndyAyersMS commented 4 years ago

There were also perf issues with limited numbers of "fast" dictionary slots (see #11971) that should now be (largely) mitigated by the dynamic dictionary expansion added in 5.0.

EgorBo commented 4 years ago

I found quite a few complains or rejected attempts to optimize LINQ for IReadOnlyCollection<T>:

https://github.com/dotnet/runtime/issues/28651 - LINQ results implicit support for IReadOnlyCollection https://github.com/dotnet/runtime/issues/27517 - Performance: Make constructor List(IEnumerable collection) know about IReadOnlyCollection https://github.com/dotnet/runtime/issues/26679 - Linq ToDictionary() should presize for IReadOnlyCollection https://github.com/dotnet/runtime/issues/14366 - System.Linq performance improvement suggestions (mentions IROC<>) https://github.com/dotnet/runtime/issues/24793 - Respect IReadOnlyList in the BCL https://github.com/dotnet/runtime/issues/23910 - Add optimized path for IReadOnlyCollection/IReadOnlyList in System.Linq https://github.com/dotnet/runtime/issues/18714 - Consider checking for IReadOnlyCollection in Enumerable.ToArray https://github.com/dotnet/runtime/issues/27516 - Performance of LINQ .Any() - type check to leverage .Count property? (mentions IROC) https://github.com/dotnet/runtime/issues/27517 - Performance: Make constructor List(IEnumerable collection) know about IReadOnlyCollection https://github.com/dotnet/corefx/pull/28472 - Check for IReadOnlyCollection https://github.com/dotnet/runtime/issues/43001 - LINQ IEnumerable extension methods should add special case IReadOnlyCollection<T>

huoyaoyuan commented 4 years ago
  1. What about customers that only implement ICollection<T> and not IReadOnlyCollection<T>?

This can be expanded to different scenarios:

  1. The collection is implemented by ICollection<T>, but exposed as IReadOnlyCollection<T> in public surface. This should be very common. Linq will not be impact here.
  2. The collection is immutable. It only implements IReadOnlyCollection<T>. This depends on the actual implementation type:
    • ReadOnlyCollection<T> (including ReadOnlyObservableCollection<T>): while it's designed to be read-only, it still implement the non-readonly interfaces. Not the case.
    • ImmutableArray<T>: it has it's own extension methods of linq to avoid boxing. Won't worry about the default linq implementation.
    • Custom readonly collection: Though this would be definitely impact, it should be a relative uncommon scenario.
  3. The collection is implemented by ICollection<TDerived>, but exposed as IReadOnlyCollection<TBase>, and gets linq called with <TBase>. This should be the scenario that's most probably get performance impact. Covariant interface check are slower, but it powers this scenario.
eiriktsarpalis commented 4 years ago

There are a few other interfaces used to determine IEnumerable counts, also not in a subtype relationship with either ICollection<T> or IReadOnlyCollection<T>. Would it make sense to include those as well?

Next step should be to enumerate a list of methods that could benefit from specialization.

ghost commented 4 years ago

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley See info in area-owners.md if you want to be subscribed.

adamsitnik commented 4 years ago

Related to https://github.com/dotnet/runtime/issues/31001

weitzhandler commented 3 years ago

Related: #23337.

BlinD-HuNTeR commented 2 years ago

Hello everyone! I'm not sure if someone else thought about this before, but I just had an idea that could solve this problem. Why not introduce a new, non-generic interface to the BCL named "ICountable", with nothing more than a "Count" property? Then just make ICollection, ICollection\<T> and IReadOnlyCollection\<T> all implement this interface. That would easily solve the problem with covariant casts, since we don't even have type parameters anymore. And we could even simplify all the code paths with just a test for "ICountable".

huoyaoyuan commented 2 years ago

Why not introduce a new, non-generic interface

Adding more interfaces can make things a mess and worse. Not all classes will implement the new interface, so an additional interface check may be required.

elgonzo commented 2 years ago

Is there any progress on the issue?

Even with .NET 6.0, Linq functions which should be able to take advantage of indexed random-access collections, such as Skip(int), don't seem to be able to handle custom read-only collections that for example implement IReadOnlyList<T> but not IList<T> (...and why should they?) without unnecessarily poor performance.

davidwrighton commented 2 years ago

No, there hasn't been any progress. The general conclusion is that we can't add new interface checks here without changing the interface checking mechanism. We've been kicking around the idea of an optimized type switch operation for a few years, but it would almost certainly make the most common case a little bit slower in exchange for allowing more scenarios to have roughly equivalent performance. However, we haven't built out that low level feature enough to see the practical impact on changing the common patterns in the Linq codebase.

eiriktsarpalis commented 2 years ago

Which is why we intentionally skipped checks for IReadOnlyCollection<T> in the new TryGetNonEnumeratedCount method (see #54764).

One possible alternative avenue to explore is introducing a common base interface for exposing the count, which should be possible using DIMs. Here's a sketch of that idea. We've generally resisted retrofitting old interfaces with DIMs so far though, since they can be susceptible to both source and runtime breaking changes.

Ultrafeel commented 10 months ago

@elgonzo consider this: ICollection<TSource>.IsReadOnly. Why not simply implement ICollection<TSource>?

custom read-only collections that for example implement IReadOnlyList but not IList (...and why should they?)

P.S. I must admit this.

CodeSetting commented 3 months ago

I'm joining the discussion regarding support for IReadOnlyCollection in TryGetNonEnumeratedCount. I favor adding support.

I came here after first writing my own version of this method. Then, IntelliSense alerted me to your version. The method names were similar.

At first, I was confused. Your code is similar to mine. Then, I was excited. I thought I could delete mine. Then, I was disappointed. I discovered that IReadOnlyCollection is not supported. It was an emotional rollercoaster 😄

While I remain in favor of adding support, I respect the opposing opinion. If the decision remains not to support it, I'd like to offer some alternate suggestions:

I understand, given its history, that its natural to consider this purely from the perspective of how the method fits into the larger LINQ ecosystem. However, not every use-case is LINQ-related. Mine was not.

That said, let me be clear, I am a huge fan of LINQ. It has probably saved me thousands of hours of coding over the years!

Thank you for your time and consideration.

stephentoub commented 3 months ago

This was done in https://github.com/dotnet/runtime/pull/101469 and then reverted in https://github.com/dotnet/runtime/pull/101644 as part of undoing the change to have the mutable collection interfaces inherit the readonly ones. When that revert is itself reverted, hopefully in .NET 10, this should naturally be addressed.

CodeSetting commented 3 months ago

@stephentoub this is very exciting news. Thank you for sharing! I hope the reversion does indeed get reverted 😄

The whole IReadOnlyCollection<T> versus ICollection<T> issue has been a real pain for quite some time!

However, it might be worth updating the comments/documentation until this (hopefully?) becomes a reality. I assume .NET 10 is at least a year away.

In the meantime, I wrote my own overload method/extension with a supportReadOnly parameter. It invokes your version, which was superior to mine. I forgot about IIListProvider<T>. Also, I (shamefully) missed the non-generic ICollection.

For reference, in case it helps anyone else, here's my new method:

public static bool TryGetNonEnumeratedCount<T>(this IEnumerable<T> source,
   out int count, bool supportReadOnly)
{
  if (source.TryGetNonEnumeratedCount(out count))
    return true;

  if (!supportReadOnly || source is not IReadOnlyCollection<T> readable)
    return false;

  count = readable.Count;
  return true;
}