dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.19k stars 4.72k forks source link

Why does LINQ not optimize for read only collections? #30756

Closed mikernet closed 4 years ago

mikernet commented 5 years ago

Why don't LINQ methods check for IReadOnlyCollection<T> instead of ICollection<T>, IReadOnlyList<T> instead of IList<T>, etc. for optimizations?

This is part of the reason read-only collections are effectively useless and unusable in .NET unless they also implement the writeable versions of the interfaces with a bunch of throw NotSupportedException() interface method implementations, but then what's the point?

Wraith2 commented 5 years ago

Doing so causes a performance degradation for everyone and is only useful in a small number of cases.

See https://github.com/dotnet/corefx/issues/26579 https://github.com/dotnet/corefx/issues/32569 https://github.com/dotnet/corefx/issues/32570 https://github.com/dotnet/corefx/issues/24773 https://github.com/dotnet/corefx/issues/17629

mikernet commented 5 years ago

In that case the utility provided by the read-only interfaces being variant is completely overshadowed by this issue. It would have been way better to leave them as invariant so that they could actually be used. How often do people actually use collection interface variance? I've yet to see it used anywhere. This effectively makes them useless.

This makes me sad...very sad. Ah well.

scalablecory commented 5 years ago

perf impact isn't actually that large considering it only hits the initial LINQ extension method, not iteration.

Can you demonstrate the benefit of this? What collections are there that don't implement the writable interface?

mikernet commented 5 years ago

According to the above linked issues, CoreRT solved the invariant casting performance issue with some caching. There are many other places that follow this pattern of optimizing for IList<T> and ICollection<T> when all they do is read and it's a royal pain, especially because I would like to program against read only interfaces but it's entirely impractical at the moment. I end up having to cast up to IList and ICollection all over the place to ensure proper optimizations take place so I gave up. It's also just bad form to have a library method which takes an IReadOnlyList<T> parameter and fails when a user provides one because it won't upcast to IList<T> (i.e. because they wrote a custom collection that implements IReadOnlyList<T> and understandably thought that met the requirements for your method). Guarding against the failed upcast possibility everywhere so you can fallback to less optimized code or lowering performance because now their code is enumerating over the whole list with unoptimized LINQ methods sucks when IReadOnlyList<T> satisfies the actual API surface they should need to provide.

This is just another nail in the coffin of practically using read only collection interfaces among many other problems that permeate through the entire framework in this regard and demonstrates why design compromises of this magnitude for the sake of performance in situations that can be optimized by the CLR in the future is a bad idea...we can effectively never go back now...there are tons of methods that take IList<T> instead of IReadOnlyList<T> and you can't add another method side-by-side with a read only signature because IList<T> doesn't implement IReadOnlyList<T> so you get ambiguos method call errors when you do that. It's just a huge steamy pile of doo doo that completely violates the Interface Segregation Principle of good code design and forces you to do the same.

We have several custom collection types that are indexable but can't be modified through the IList<T> interface, or which can count efficiently but can't add or remove items via ICollection<T> methods....but we are forced to implement the full writable interfaces every time so that LINQ can use them efficiently, as well as collection constructors that take source collections and WPF indexer bindings.

Wraith2 commented 5 years ago

What stops you providing your own linq function implementations?

mikernet commented 5 years ago

With DIMs there's now a possibility that a small set of optimizable LINQ methods can call a DIM which can be overridden to provide a more optimized version of the algorithms but otherwise it's too finicky...you just end up with side-by-side LINQ methods that behave differently depending on whether your object is stored as an IReadOnlyCollection<T> or an IEnumerable<T>. Part of the beauty of LINQ is that it (should) just work on all IEnumerable<T>s without worrying about such details. I can do this, but it's less ideal than just having it work properly with standard LINQ methods...hence this issue.

mikernet commented 5 years ago

You also end up with "ambiguous method call" errors if you use a class that explicitly specifies that it implements both IReadOnlyCollection<T> and IEnumerable<T>, which I have run into before. There are workarounds but it just ends up being annoying and/or error prone. I'm trying to remember what the exact situation was that caused this but I don't remember, it was a long time ago. I think it had something to do with overriding GetEnumerator() behavior in a subclass so it had to explicitly declare that it implements IEnumerable<T> directly on the subclass or something like that. I can't remember to be honest but I do remember that having mirrored LINQ methods for other collection types created several issues.

In the case of me exposing a property as a custom read only collection or an IReadOnlyCollection<T>, the user of my library now has to also know to use the optimized LINQ methods instead of the normal ones, and if they pass the collection to any methods that consume it as an IEnumerable<T> then it uses unoptimized LINQ instead. Internally I can use my optimized calls but it isn't a good user experience.

stephentoub commented 5 years ago

Closing as a dup of all the issues listed by @Wraith2 above.