dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19k stars 4.03k forks source link

Proposal: Return IreadOnlyCollection or IReEnumerable from Select(IReEnumerable) #8271

Closed davjs closed 7 years ago

davjs commented 8 years ago

As mentioned by Paul Stovell's stack overflow answer "The goal of taking the highest object is noble, but it leaves room for too many assumptions". : http://stackoverflow.com/questions/8240844/handling-warning-for-possible-multiple-enumeration-of-ienumerable

Currently Select() returns an IEnumerable regardless of what has been passed. Because of the fact that we often want to iterate our collections multiple times (by using almost any combination of Linq operations on the Enumerable more than once) we end up having to call .ToList() on the result (if we want to avoid warnings and follow design principles). This is something i see everywhere, code cluttered with .ToList() calls.

By changing the return type of Select() when called with a collection that is "Re Enumerable" to return something that is also "Re Enumerable" we can get rid of all these cluttered .ToList() calls.

Possibly by returning IReadOnlyCollection when passed a value implementing IReadOnlyCollection or by adding a new interface called "IReEnumerable".

Because both of these would implement / are implementing IEnumerable this change would not break any previous code. One could argue the same thing for OfType(), SelectMany() etc..

benjamingr commented 8 years ago

ping @headinthebox

svick commented 8 years ago

Are you proposing to not make LINQ lazy in those cases?

For example, since collection.Select(x => expensiveFunction(x)) will return a collection (not just a lazy enumerable), does that mean collection.Select(x => expensiveFunction(x)).First() will compute expensiveFunction() for all items in the collection?

Or are you just proposing something similar to F# Seq.cache, only done automatically?

davjs commented 8 years ago

I believe having an overload that suddenly is non lazy would be inconsistent and make things very confusing. Thanks for pointing it out.

if the enumerator/collection would cache the results from the first iteration it could actually break code that depends on side effects executed while iterating. For example: foo.Where(x => x.member > i++). So it would have to be both lazy and recomputed (i.e not using any cache) in order to be backwards-compatible. (Unless there is some hack to check if a Func has side effects or really only is a property accessory).

Possibly there could be another function called SelectCachedbut then one could argue how much is gained by writing foo.SelectCached( x => x.member) instead of foo.Select( x => x.member).ToList() except that the former is lazy evaluated.

benjamingr commented 8 years ago

I thought what you want is a separate intermediate interface that illustrates that an IEnumerable can be reset.

svick commented 8 years ago

So it would have to be both lazy and recomputed i.e not using any cache in order to be backwards-compatible.

I agree. But then I don't understand what would be the improvement.

alrz commented 8 years ago

Similar F#'s flexible types, you might define something like this,

T F<T, U>(T arg) where T : IEnumerabe<U> 

So you will know that the type of the input and the output wouldn't be changed and it's IEnumerable but this won't help, because there are cases that you actually want to change the type and at best it becomes a matrix of constraints over T.

As for IReadOnlyCollection perhaps you should define all linq operators for it?

davjs commented 8 years ago

benjamingr Yes, i just got a bit confused thinking about how it would be implemented. But There isn't really anything to implement, its just as you say the specification of an intermediate interface that illustrates that the enumeration can be reset. In fact the iterator currently returned by Linq functions already supports re-iteration if passed a collection that supports it, its just a matter of showing it with an interface. Thanks for the wording.

svick The improvement is getting rid of ToList calls that are made to ensure the result can be re-iterated.

HaloFour commented 8 years ago

LINQ was very specifically designed to defer execution until enumeration. If you want materialization or memorization you should have to explicitly opt into that via ToList, ToArray or through any number of other extension methods offered by the BCL or third parties. These operations add computational and memory expense.

That said, this is all a BCL/CoreFX proposal. C# knows about extension methods and lambdas, it doesn't know or care about how any of these methods are actually implemented. The language has no clue that OrderBy internally materializes the entire sequence nor does it care.

davjs commented 8 years ago

@HaloFour returning an explicit interface showing that reiteration is possible on the enumerable / collection does not violate the principles of lazy loading. Nor would it have to be memorized / cached.

I am sorry for writing this in the wrong section though, Linq has become so associated with the language itself so I mixed them up. What happens now, can i move this?

HaloFour commented 8 years ago

@davidkron Worth proposing at the CoreFX repo. Adding a new extension method for such an interface should be trivial (you can prototype one yourself easily).

aluanhaddad commented 8 years ago

@HaloFour is correct. You can implement LINQ operators for any type with any semantics you wish.

That said, I think this proposal is more of a request for higher kinded polymorphism.

gafter commented 7 years ago

We are now taking language feature discussion on https://github.com/dotnet/csharplang for C# specific issues, https://github.com/dotnet/vblang for VB-specific features, and https://github.com/dotnet/csharplang for features that affect both languages.