dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.9k stars 4.01k forks source link

Feature request: Better way of treating null as empty enumerable in foreach loops #6563

Closed rickardp closed 7 years ago

rickardp commented 8 years ago

Consider the following loop:

IEnumerable<SomeType> myEnumerable = ...

foreach(var item in myEnumerable ?? Enumerable.Empty<SomeType>) {
    ...
}

which will default to an empty enumerable in case myEnumerable above is null. This becomes quite awkward when SomeType is a rather complex type (Like Dictionary<string, Func<Tuple<string, int>>>), or even impossible if the type is anonymous.

One way this can be solved right now is to create an extension method, e.g.

public static IEnumerable<T> EmptyIfNull(this IEnumerable<T> enumerable) {
   return enumerable ?? Enumerable.Empty<T>();
}

which is not bad at all actually (though still not built into LINQ AFAIK), but I think that it would be better still if there was a way similar to the null-propagation operators (?. and ?[), e.g. (hopefully with a better syntax)

foreach?(var item in myEnumerable) 
    ...
}

I understand that constructing the syntax for a solution to this problem can be tricky, but I think it is at least worth considering since I believe the case is quite common (I find it a lot more common than the null-propagation indexer operator in my code).

Why using null at all and not empty enumerables everywhere? One example where I find the scenario common is in serialization of objects, where empty collections are omitted and should therefore semantically be identical to null. Since default(IEnumerable<>) is null, this means that null will be returned when the key is not found. This would also add symmetry with null propagating index operators.

dsaf commented 8 years ago

The question mark should probably be next to enumerable rather than the keyword? Although there might be some ambiguity I guess...

foreach (var item in myEnumerable?)
{
}
vladd commented 8 years ago

Well, in my opinion the empty enumerable should be distinguishable from null enumerable object, so the two should not behave in the same way. In particular, the whole LINQ world (.Select. .Where, .Concat etc.) throws on null, but works correctly with empty sequence. Making enumerating null a special case would create inconsistency between the language and the standard library.

(If you encounter often nulls returned from some API, meaning empty sequences, maybe it's API what should be corrected?)

rickardp commented 8 years ago

@vladd I think your point is very valid, and in this way C# differs from, for example, Objective-C where a nil array is essentially the same as an empty array.

The reason that I think this would be helpful is not helping to deal with broken APIs but rather interaction with dynamic languages and objects, generics where default(T) is returned e.g. when a key is not found. To give a concrete example, consider some JSON parser generated from C# classes, parsing configuration data.

public class Configuration 
{
    public List<string> LogFiles { get; set; }
}

Then, to handle the case where the user might have omitted the LogFiles key entirely, or provided an empty array, you are doing something like this:

if (config.LogFiles != null)
{
    foreach (var logFile in config.LogFiles)
    {
        ....
    }
}

I tend to see this pattern a lot (in my code and others') especially when it comes to dealing with data serialization and interaction with external programs.

For LINQ and index operators we already have null propagation that helps with null checking. I think it is still worth considering some kind of symmetry with foreach since that is such a useful construct.

bbarry commented 8 years ago

I don't know if it is particularly useful but you can always do:

myEnumerable?.ToList().ForEach(item => ...); // .ToList() may be unnecessary

or:

myEnumerable?.AsParallel().ForAll(item => ...);
aluanhaddad commented 8 years ago

@rickardp This is interesting. Most of the time, returning null for collection types likely indicates a badly designed API. The serialization point however, is worth examining in more detail. I also run into this issue/question regularly. I think the correct way to handle it is often to customize the serialization/deserialization to impose the semantics you desire. If you are writing the serialized classes, you can initialize collection properties to empty.

That said, the question of whether empty and null enumerables should be treated as semantically equivalent is just as relevant in the context of serialization as it is in any other. That an object has been serialized does not imply that these semantics should change. For example, as someone who does both frontend and backend development on a daily basis, I would rather receive an empty array than nothing when I am deserializing JSON in JavaScript.

rickardp commented 8 years ago

@aluanhaddad Good points! Though I often find myself in a situation where this design is out of my control. Some code may be generated, or some tradeoffs might have been done for performance reasons (e.g. large data sets on mobile devices), and in any case that piece of code might not be maintained by me. IMHO one of the greatest strengths with C# is its ability to deal with real-world situations while still keeping your own code clean and maintainable.

Even as we are embracing the semantic difference between null and empty (e.g. where null means "not specified"), the sensible default behavior might still be to not traverse the collection when it is null in many (not all!) situations. After all, I suppose this is why the null-propagating operators were so much requested in the first place.

aluanhaddad commented 8 years ago

While the ?. operator is wonderful, I worry that providing a syntactic sugar specifically for handling possibly null enumerables, will encourage people to return null instead of empty, and thus also encourage people to use the new null safe enumeration syntax everywhere.

I think that the combination of the ?. operator and #5032, if adopted in its current form, will effectively make nullability the safe and syntactically supported way to represent optional values. If we imagine a C# where values are (assumed to be) non-nullable by default, then nullability and the ?. operator effectively become C#'s Maybe/Option monad. However, while this makes sense for scalar values, I don't think it makes sense to use it for optional aggregate values because we can think of a nullable type as a conceptual Option<T>. Then, if we think of Option<T> as essentially a trivial collection which contains either 0 or 1 elements, it suggests that aggregate values do not likely need to be wrapped further. You can implement Option<T> as a library, providing an extension method to wrap possible null values, and then write queries over the optionals, but I digress.

At any rate, your extension method

public static IEnumerable<T> EmptyIfNull(this IEnumerable<T> enumerable) {
   return enumerable ?? Enumerable.Empty<T>();
}

actually handles the uncertainty quite well without the need for additional syntax.

yaakov-h commented 8 years ago

Considering how often something like this is needed, what's the value of returning null? i.e., what's the semantic difference between null (there is no collection of items) versus an empty enumerable (there is no items in the collection)? In what case would a function or property that returns both be valid?

paulomorgado commented 8 years ago

@yaakov-h, the difference is the difference between and empty box (empty collection) and no box (null).

If you have no box, you can assume you have no items. But that's a particular interpretation of what you having no box means. There might be a box but hasn't been handed to you yet. You cannot tell the color of a box if you have no box. Nor its size/capacity. You can fill an empty box, but you can't fill no box.

NickCraver commented 8 years ago

The confusion here I see is for new devs hitting null reference exceptions here. It's just not expected, they don't know that .GetEnumerator() is called underneath and it's a compiler implementation detail that they're hitting which isn't very obvious. I don't want to allocate a new collection of X every time just to not throw, and the extension method can only be applied to generic collections, not those of older IEnumerator (non-generic) types.

In almost all null cases, I simply don't want to loop if there's nothing to loop over. Today, that's a wrapper:

if (myCollection != null) 
{
    foreach (var i in myCollection) { }
}

It'd be nice to have a cleaner syntax for this all around, as we have tons of these if statements. I strongly disagree this is automatically evidence of a bad API design, it often happens completely inside a method and with serialization as noted above. On the serialization front: we use protocol buffers which makes no distinction between empty and null, so you get a null.

I think we can do something here to save a lot of wrapper code, though where the best place for a ? (or alternative) is, I'm not picky about. Eliminating an extra 3 lines of code around many foreach statements would be nice, though. To be fair, I don't think this will help the new developer case much at all. They're still going to write a foreach and get a null ref before figuring it out and learning alternative approaches. I can't think of any new-syntax way to help, it still has to be discovered.

gafter commented 8 years ago

I don't want to allocate a new collection of X every time just to not throw, and the extension method can only be applied to generic collections, not those of older IEnumerator (non-generic) types.

It sounds like Enumerator.Empty<object>() is exactly what you need. It returns something that implements the non-generic older interface IEnumerator too, and it doesn't allocate something on each call.

paulomorgado commented 8 years ago

@gafter, I think @NickCraver is saying that the foreach statement should, instead of myCollection.GetEnumerator() it should be something functionally equivalent to myCollection == null ? Enumerator.Empty<TItem>() : myCollection.GetEnumerator().

But that would be a breaking change.

NickCraver commented 8 years ago

@paulomorgado Oh no - I'm saying foreach? (or other syntax) should behave that way. I am in no way arguing for a breaking change - that simply won't happen. Do I think foreach should have behaved this way? Yeah, probably...but that ship has sailed, we can't change it now.

HaloFour commented 8 years ago
public static class EnumerableExtensions {
    public static IEnumerable<T> EmptyIfNull<T>(this IEnumerable<T> enumerable) {
        return (enumerable != null) ? enumerable : Enumerable.Empty<T>();
    }
}
paulomorgado commented 8 years ago

@NickCraver, that was my understanding, but didn't want to open the can of worms of ? on statements. What's next? while?? lock??

@HaloFour, I still operate under the premisse that extension methods should look and behave like instance methods.

NickCraver commented 8 years ago

@paulomorgado none of those constructs suffer from this issue, nor does using, switch, etc. - I don't buy the slippery slope argument here. I can't think of even one other block this applies. This is directly and specifically a result of the compiler implementation of foreach calling .GetEnumerator() underneath and unexpectedly throwing a null reference. The argument is simply for more terse handling of such a common case.

aluanhaddad commented 8 years ago

the extension method can only be applied to generic collections, not those of older IEnumerator (non-generic) types.

You can define extension methods for non generic types.

The confusion here I see is for new devs hitting null reference exceptions here. It's just not expected, they don't know that .GetEnumerator() is called underneath and it's a compiler implementation detail that they're hitting which isn't very obvious.

I think the problem is that the error message is expressed in terms of the compiler implementation, not that there is an error. Imagine if the error message read "Unable to enumerate null", wouldn't that clear things up?

If methods ignore the nullness of the enumerables they receive, merging the concept of null with empty, they are likely to propagate null in their return values. I think that would be harmful. Everyone would use the hypothetical new nullsafe foreach everywhere.

NickCraver commented 8 years ago

You can define extension methods for non generic types.

Yes, if we want to define hundreds of extension methods. I was specifically talking about the generic extension methods proposed earlier, the only reasonable thing that may be added in almost all cases.

I agree the error message isn't obvious, and clearing it up would help, but it still doesn't address the usability and verbosity issue of wrapping everything in if (enumerable != null) { } today. It has little to do with propagating null, you choose whether to do that today and you'd choose whether to do it with a new syntax. I don't buy the argument of it being a bad thing because it more easily enables you to do what you were going to do anyway. By using the new syntax, intent is clear. The same way it's clear (and explicitly opted-into) with ?. in C# 6.

I don't believe this enables any better or worse behavior than today's verbose method. It simply allows developers to do what they want with less syntax, exactly as ?. does already.

I think that would be harmful. Everyone would use the hypothetical new nullsafe foreach everywhere.

Two things here:

  1. This assumes that it's harmful. Some people prefer to return null. We certainly do at Stack Overflow, we don't want the allocation of an empty thing just to do nothing with it.
  2. That's a huge assumption with no backing. People don't use ?. everywhere either - not even close. I think this is a very bad assumption to base any decisions on.
HaloFour commented 8 years ago

@paulomorgado

I still operate under the premisse that extension methods should look and behave like instance methods.

That's your decision, but my code is perfectly legal. It's pretty convenient to be able to invoke extension methods off null specifically to handle easy/fluent checks like that. Note that throwing NullReferenceException when trying to invoke an instance method on null is specifically a C# limitation as a part of the language, the CLR is perfectly happy to allow it via the call opcode.

yaakov-h commented 8 years ago

Note that throwing NullReferenceException when trying to invoke an instance method on null is specifically a C# limitation

Please explain which IL opcodes the C# compiler emits that causes this.

My understanding is that NullReferenceException is the CLR's reinterpretation of a low-address access violation, and is not part of C#.

On 31 May 2016, at 8:27 AM, HaloFour notifications@github.com wrote:

Note that throwing NullReferenceException when trying to invoke an instance method on null is specifically a C# limitation

HaloFour commented 8 years ago

@yaakov-h

C# always uses callvirt when invoking instance methods, even if the method isn't virtual. That opcode has the side effect of always throwing NullReferenceException if the instance is null, which satisfies C#'s language specification requirement:

https://blogs.msdn.microsoft.com/ericgu/2008/07/02/why-does-c-always-use-callvirt/

If C# were to instead use the call opcode for instance methods then the CLR would allow them to be called and the this pointer would be null. You'd lose virtual dispatch on virtual methods in those cases, though.

aluanhaddad commented 8 years ago

@NickCraver First of all I did not mean to come off as dismissing the value of returning null from a method which returns a collection, if it means improved performance in a critical path, then that is a perfectly valid reason to use it. My point is that I do not want to see the convenience of this foreach? style construct encourage conflation of null with empty.

With regard to performance, I think it is valid but, as @gafter pointed out, methods that return System.Collections.IEnumerable can delegate to System.Linq.Enumerable.Empty<Object>, so there is no allocation penalty.

With regard to extension methods, I'd be curious to see how many collection types you would need to define an extension method for. Hundreds does sounds very painful, but that is a lot of very specific types to be returning, especially if they are mostly used with foreach. Again, I don't know your use case.

That's a huge assumption with no backing. People don't use ?. everywhere either - not even close. I think this is a very bad assumption to base any decisions on.

Well, all of my evidence is anecdotal, and thus should be disregarded, but I know many programmers who advocate defensive programming at the point of use. In other words, they are not confident in the API contract or it is insufficiently specified. In fact, it's been suggested to me that I should check IEnumerables for null even when returned by internal types which I initialize with Enumerable.Empty

Anyway, if your code base is full of

if(values != null) 
{
    foreach (var value in values) { ... }    
}

I would assume you want to replace them all with foreach? (or whatever syntax it would be). That would certainly mean propagating use of the new syntax. Additionally, as you bring up the learning curve for new programmers, they will now have to ask themselves, why are there 2 foreach constructs? Which one do I use? And I predict, again without evidence, that the answer would often be "the safe one". Which is safe for some definition of safe. And it would likely be the best choice when working in a code base full of collection returning methods that propagate nulls.

Edit: fixed formatting.

NickCraver commented 8 years ago

@aluanhaddad How do you see this as any different than ?.Max() or ?.Where(), etc.? All of these debates applied to the safe navigation (?.) operator, and the community saw benefit over any perceived downsides. Have we seen devs go crazy with that and use it everywhere? I certainly haven't, on the contrary: from what I've seen it's very rarely used and almost always where appropriate. In fact the use cases I've seen like deserialized structures are very much in line with the use cases in this issue.

Would I use foreach? in the current if case? Absolutely. It's a shorter way to get the same effect. It's the same as ?. instead of nested ifs, { get; set; } instead of a full property with a declared backer, etc. I don't see this as perpetrating anything "wrong". Everything it does exists today, we just have to write more code to do it. Many additions over the past few versions of C# are exactly that: new syntax or constructs to make common cases more terse.

As for IEnumerable, there are tons of types in the BCL that are IEnumerable and not a generic version. Each would need its own extension method. There's a lot of old business code started before generics also follows this pattern of a specific collection type as well, but luckily I don't have that problem anymore. I have at past jobs, though.

I think the "programmers will do X" argument is moot in either case. If the argument is "they'll just use foreach?", then wouldn't that same group just be adding an if wrapper today? That's exactly what I'd do. Why can't we make life easier? That's what all new language features aim to do.

aluanhaddad commented 8 years ago

I don't see this as perpetrating anything "wrong". Everything it does exists today, we just have to write more code to do it.

@NickCraver I certainly did not mean to come off as accusatory. I apologize.

I do think in general that it's a poor practice to return null for values of type IEnumerable, but there are certainly exceptions, and I do not mean to judge. What I would like to avoid however, is the introduction of a new foreach construct.

NickCraver commented 8 years ago

For IEnumerable? Yes sure, it's easy to make the argument for not being null and there are zero allocations ways to achieve this. However, very often our return types are List<T>, Dictionary<TKey,TValye>, etc. - not the base IEnumerable<T>. Do you consider null invalid in those cases? We'd certainly rather not new up a collection just to throw it away, especially inside the same assembly and not on any public API. The same applies to null lists from deserialization as discussed earlier.

Also what about types that aren't returned at all? In razor views for example you're rendering the output and we have to put these if constructs wrapping everywhere as well. I think the "null return" is a narrow view of the much wider range of use cases. All of these arguments against (and reasoning for) mirror ?., which the community and C# team approved and serves as a tremendous time and code saver today.

To be clear: I'm not advocating specifically for foreach?, I've seen foreach(var i in? myEnumerable) and foreach(var i in myEnumerable?) proposed as well. Or, something else. I'm not all all picky about the syntax; there are many ways to improve over the extra 3 lines we require today.

markrendle commented 8 years ago

foreach (var i in myEnumerable?) would seem to fit very nicely with the semantics of the existing null-safe operator, since it is applied to the variable rather than a keyword.

aluanhaddad commented 8 years ago

However, very often our return types are List, Dictionary<TKey,TValye>, etc. - not the base IEnumerable. Do you consider null invalid in those cases?

I generally return these types through interfaces like IReadOnlyList<T>. If you are doing that, you can return a static field like the one returned by Enumerable.Empty<T> for the extension method on that type. If the collections are exposed via mutable interfaces, then you will need to check for null regardless if you want to make use of that interface. However, if you are only reading from them, why not enjoy the benefits, like IReadOnlyList<T>'s covariance, and generally easier to reason about code, that you can get from an immutable interface?

@markrendle I do find that the syntax foreach (var i in myEnumerable?) makes this more palatable. It is acting on the value, not appearing to introduce a special null aware keyword.

NickCraver commented 8 years ago

@aluanhaddad The "why?" for most of those is you're making a lot of assumptions about a codebase based on personal experience, which we all do. We only know what we've seen. However, you must realize many people don't use those interfaces, don't want to cast as those interfaces, and all of that complicates code in other ways.

If the collections are exposed via mutable interfaces, then you will need to check for null regardless if you want to make use of that interface.

No...I wouldn't. That's the entire point. I want to say "if null, skip it", that's what this entire issue is about. Also don't assume read-only behavior, that's a bad assumption. Language features and constructs cannot make assumptions like this, they have to handle the widest variety of cases and throw warning or errors for all the rest.

In general most comments here ignore performance. They miss the point of not wanting to call .GetEnumerator() at all if there's nothing there. Using an empty collection is a loss in performance. You'll not convince me to use empty collections everywhere - that's just not how we do things here, and many places I've been. It's not the fastest code you can write, a null check is. The assumption of using empty collections everywhere also assumes you always control both the input and output of every method. This is very rarely the case everywhere.

Why not use IReadOnlyList<T>? Because now I've arbitrarily (and drastically) restricted the methods available down to satisfy some requirement that was arbitrarily imposed. It also totally rules out the read/write case, of which we have many. The answer here should not be "rewrite all of your perfectly valid code and create more allocations and restrictions". We can do better. This issue is about doing better.

ghost commented 8 years ago

in? coalesces the pretty syntax + user intent, and it's more C#-y than the other proposed options.

gafter commented 7 years ago

Issue moved to dotnet/csharplang #342 via ZenHub