dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.99k stars 4.03k forks source link

Proposal for non-nullable references (and safe nullable references) #227

Closed Neil65 closed 7 years ago

Neil65 commented 9 years ago

1. Overview

This is my concept for non-nullable references (and safe nullable references) in C#. I have tried to keep my points brief and clear so I hope you will be interested in having a look through my proposal.

I will begin with an extract from the C# Design Meeting Notes for Jan 21, 2015 (https://github.com/dotnet/roslyn/issues/98):

There's a long-standing request for non-nullable reference types, where the type system helps you ensure that a value can't be null, and therefore is safe to access. Importantly such a feature might go along well with proper safe nullable reference types, where you simply cannot access the members until you've checked for null.

This is my proposal for how this could be designed. The types of references in the language would be:

Important points about this proposal:

  1. There are no language syntax changes other than the addition of the '!' and '?' syntax when declaring (or casting) references.
  2. Null reference exceptions are impossible if the new style references are used throughout the code.
  3. There are no changes to the actual code compilation, by which I mean we are only adding compiler checks - we are not changing anything about the way that the compiled code is generated. The compiled IL code will be identical whether traditional (general) references or the new types of references are used.
  4. It follows from this last point that the runtime will not need to know anything about the new types of references. Once the code is compiled, references are references.
  5. All existing code will continue to compile, and the new types of references can interact reasonably easily with existing code.
  6. The '!' and '?' can be added to existing code and, if that existing code is 'null safe' already, the code will probably just compile and work as it is. If there are compiler errors, these will indicate where the code is not 'null safe' (or possibly where the 'null safe-ness' of the code is expressed in a way that is too obscure). The compiler errors will be able to be fixed using the same 'plain old C#' constructs that we have always used to enforce 'null safe-ness'. Conversely, code will continue to behave identically if the '!' and '?' are removed (but the code will not be protected against any future code changes that are not 'null safe').
  7. No doubt there are ideas in here that have been said by others, but I haven't seen this exact concept anywhere. However if I have reproduced someone else's concept it was not intentional! (Edit: I now realise that I have unintentionally stolen the core concept from Kotlin - see http://kotlinlang.org/docs/reference/null-safety.html).

The Design Meeting Notes cite a blog post by Eric Lippert (http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types/#.VM_yZmiUe2E) which points out some of the thorny issues that arise when considering non-nullable reference types. I respond to some of his points in this post.

Here is the Dog class that is used in the examples:

public class Dog
{
    public string Name { get; private set; }

    public Dog(string name)
    {
        Name = name;
    }

    public void Bark()
    {
    }
}

2. Background

I will add a bit of context that will hopefully make the intention of the idea clearer.

I have thought about this topic on and off over the years and my thinking has been along the lines of this type of construct (with a new 'check' keyword):

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

check (nullableDog)
{
    // This code branch is executed if the reference is non-null. The compiler will allow methods to be called and properties to be accessed.
    nullableDog.Bark(); // OK.
}
else
{
    nullableDog.Bark(); // Compiler Error - we know the reference is null in this context.
}

The 'check' keyword does two things:

  1. It checks whether the reference is null and then switches the control flow just like an 'if' statement.
  2. It signals to the compiler to apply certain rules within the code blocks that follow it (most importantly, rules about whether or not nullable references can be dereferenced).

It then occurred to me that since it is easy to achieve the first objective using the existing C# language, why invent a new syntax and/or keyword just for the sake of the second objective? We can achieve the second objective by teaching the compiler to apply its rules wherever it detects this common construct:

if (nullableDog != null)

Furthermore it occurred to me that we could extend the idea by teaching the compiler to detect other simple ways of doing null checks that already exist in the language, such as the ternary (?:) operator.

This line of thinking is developed in the explanation below.

3. Mandatory References

As the name suggests, mandatory references can never be null:

Dog! mandatoryDog = null; // Compiler Error.

However the good thing about mandatory references is that the compiler lets us dereference them (i.e. use their methods and properties) any time we want, because it knows at compile time that a null reference exception is impossible:

Dog! mandatoryDog = new Dog("Mandatory");
mandatoryDog.Bark(); // OK - can call method on mandatory reference.
string name = mandatoryDog.Name; // OK - can access property on mandatory reference.

(See my additional post for more details.)

4. Nullable References

As the name suggests, nullable references can be null:

Dog? nullableDog = null; // OK.

However the compiler will not allow us (except in circumstances described later) to dereference nullable references, as it can't guarantee that the reference won't be null at runtime:

Dog? nullableDog = new Dog("Nullable");
nullableDog.Bark(); // Compiler Error - cannot call method on nullable reference.
string name = nullableDog.Name; // Compiler Error - cannot access property on nullable reference

This may make nullable references sound pretty useless, but there are further details to follow.

5. General References

General references are the references that C# has always had. Nothing is changed about them.

Dog generalDog1 = null; // OK.
Dog generalDog2 = new Dog("General"); // OK.

generalDog.Bark(); // OK at compile time, fingers crossed at runtime.

6. Using Nullable References

So if you can't call methods or access properties on a nullable reference, what's the use of them?

Well, if you do the appropriate null reference check (I mean just an ordinary null reference check using traditional C# syntax), the compiler will detect that the reference can be safely used, and the nullable reference will then behave (within the scope of the check) as if it were a mandatory reference.

In the example below the compiler detects the null check and this affects the way that the nullable reference can be used within the 'if' block and 'else' block:

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

if (nullableDog != null)
{
    // The compiler knows that the reference cannot be null within this scope.
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    // The compiler knows that the reference is null within this scope.
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}

The compiler will also recognise this sort of null check:

if (nullableDog == null)
{
    return;
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

And this:

if (nullableDog == null)
{
    throw new Exception("Where is my dog?");
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

The compiler will also recognise when you do the null check using other language features:

string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK

Hopefully it is now clear that if the new style references are used throughout the code, null reference exceptions are actually impossible. However once the effort has been made to convert the code to the new style references, it is important to guard against the accidental use of general references, as this compromises null safety. There needs to be an attribute such as this to tell the compiler to prevent the use use of general references:

[assembly: AllowGeneralReferences(false)] // Defaults to true

This attribute could also be applied at the class level, so you could for example forbid general references for the assembly but then allow them for a class (if the class has not yet been converted to use the new style references):

[AllowGeneralReferences(true)]
public class MyClass
{
}

(See my additional post for more details.)

7. Can we develop a reasonable list of null check patterns that the compiler can recognise?

I have not listed every possible way that a developer could do a null check; there are any number of complex and obscure ways of doing it. The compiler can't be expected to handle cases like this:

if (MyMethodForCheckingNonNull(nullableDog))
{
}

However the fact that the compiler will not handle every case is a feature, not a bug. We don't want the compiler to detect every obscure type of null check construct. We want it to detect a finite list of null checking patterns that reflect clear coding practices and appropriate use of the C# language. If the programmer steps outside this list, it will be very clear to them because the compiler will not let them dereference their nullable references, and the compiler will in effect be telling them to express their intention more simply and clearly in their code.

So is it possible to develop a reasonable list of null checking constructs that the compiler can enforce? Characteristics of such a list would be:

  1. It must be possible for compiler writers to implement.
  2. It must be intuitive, i.e. a reasonable programmer should never have to even think about the list, because any sensible code will 'just work'.
  3. It must not seem arbitrary, i.e. there must not be situations where a certain null check construct is detected and another that seems just as reasonable is not detected.

I think the list of null check patterns in the previous section, combined with some variations that I am going to put in a more advanced post, is an appropriate and intuitive list. But I am interested to hear what others have to say.

Am I expecting compiler writers to perform impossible magic here? I hope not - I think that the patterns here are reasonably clear, and the logic is hopefully of the same order of difficulty as the logic in existing compiler warnings and in code checking tools such as ReSharper.

8. Converting Between Mandatory, Nullable and General References

The principles presented so far lead on to rules about conversions between the three types of references. You don't have to take in every detail of this section to get the general idea of what I'm saying - just skim over it if you want.

Let's define some references to use in the examples that follow.

Dog! myMandatoryDog = new Dog("Mandatory");
Dog? myNullableDog = new Dog("Nullable");
Dog myGeneralDog = new Dog("General");

Firstly, any reference can be assigned to another reference if it is the same type of reference:

Dog! yourMandatoryDog = myMandatoryDog; // OK.
Dog? yourNullableDog = myNullableDog; // OK.
Dog yourGeneralDog = myGeneralDog; // OK.

Here are all the other possible conversions. Note that when I talk about 'intent' I am meaning the idea that a traditional (general) reference is conceptually either mandatory or nullable at any given point in the code. This intent is explicit and self-documenting in the new style references, but it still exists implicitly in general references (e.g. "I know this reference can't be null because I wrote a null check", or "I know that this reference can't or at least shouldn't be null from my knowledge of the business domain").

Dog! mandatoryDog1 = myNullableDog; // Compiler Error - the nullable reference may be null.
Dog! mandatoryDog2 = myGeneralDog; // Compiler Error - the general reference may be null.
Dog? nullableDog1 = myMandatoryDog; // OK.
Dog? nullableDog2 = myGeneralDog; // Compiler Error - makes an assumption about the intent of the general reference (maybe it is conceptually mandatory, rather than conceptually nullable as assumed here).
Dog generalDog1 = myMandatoryDog; // Compiler Error - loses information about the intent of the mandatory reference (the general reference may be conceptually mandatory, or may be conceptually nullable if the intent is that it could later be made null).
Dog generalDog2 = myNullableDog; // Compiler Error - loses the safety of the nullable reference.

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).

Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .

Some of the conversions that were not possible by direct assignment can be achieved slightly less directly using existing language features:

Dog! mandatoryDog1 = myNullableDog ?? new Dog("Mandatory"); // OK.
Dog! mandatoryDog2 = (myNullableDog != null ? myNullableDog : new Dog("Mandatory")); // OK.

Dog! mandatoryDog3 = (Dog!)myGeneralDog ?? new Dog("Mandatory"); // OK, but requires cast to indicate that we are making an assumption about the intent of the general reference..
Dog! mandatoryDog4 = (myGeneralDog != null ? (Dog!)myGeneralDog : new Dog("Mandatory")); // OK, but requires a cast for the same reason as above.

9. Class Libraries

As mentioned previously, the compiled IL code will be the same whether you use the new style references or not. If you compile an assembly, the resulting binary will not know what type of references were used in its source code.

This is fine for executables, but in the case of a class library, where the goal is obviously re-use, the compiler will need a way of knowing the types of references used in the public method and public property signatures of the library.

I don't know much about the internal structure of DLLs, but maybe there could be some metadata embedded in the class library which provides this information.

Or even better, maybe reflection could be used - an enum property indicating the type of reference could be added to the ParameterInfo class. Note that the reflection would be used by the compiler to get the information it needs to do its checks - there would be no reflection imposed at runtime. At runtime everything would be exactly the same as if traditional (general) references were used.

Now say we have an assembly that has not yet been converted to use the new style references, but which needs to use a library that does use the new style references. There needs to be a way of turning off the mechanism described above so that the library appears as a traditional library with only general references. This could be achieved with an attribute like this:

[assembly: IgnoreNewStyleReferences("SomeThirdPartyLibrary")]

Perhaps this attribute could also be applied at a class level. The class could remain completely unchanged except for the addition of the attribute, but still be able to make use of a library which uses the new style references.

(See my additional post for more details.)

10. Constructors

Eric Lippert's post (see reference in the introduction to this post) also raises thorny issues about constructors. Eric points out that "the type system absolutely guarantees that ...[class] fields always contain a valid string reference or null".

A simple (but compromised) way of addressing this may be for mandatory references to behave like nullable references within the scope of a constructor. It is the programmer's responsibility to ensure safety within the constructor, as has always been the case. This is a significant compromise but may be worth it if the thorny constructor issues would otherwise kill off the idea of the new style references altogether.

It could be argued that there is a similar compromise for readonly fields which can be set multiple times in a constructor.

A better option would be to prevent any access to the mandatory field (and to the 'this' reference, which can be used to access it) until the field is initialised:

public class Car
{
    public Engine! Engine { get; private set; }

    public Car(Engine! engine)
    {
        Engine.Start(); // Compiler Error
        CarInitializer.Initialize(this); // Compiler Error - the 'this' reference could be used to access Engine methods and properties
        Engine = engine;
        // Can now use Engine and 'this' at will
    }
}

Note that it is not an issue if this forces adjustment of existing code - the programmer has chosen to introduce the new style references and thus will inevitably be adjusting the code in various ways as described earlier in this post.

And what if the programmer initializes the property in some way that still makes everything safe but is a bit more obscure and thus more difficult for the compiler to recognise? Well, the general philosophy of this entire proposal is that the compiler recognises a finite list of sensible constructs, and if you step outside of these you will get a compiler error and you will have to make your code simpler and clearer.

11. Generics

Using mandatory and nullable references in generics seems to be generally ok if we are prepared to have a class constraint on the generic class:

class GenericClass<T>
    where T : class // Need class constraint to use mandatory and nullable references
{
    public void TestMethod(T? nullableRef)
    {
        T! mandatoryRef = null; // Compiler Error - mandatory reference cannot be null
        string s = nullableRef.ToString(); // Compiler Error - cannot dereference nullable reference
    }
}

However there is more to think about generics - see comments below.

12. Var

This is the way that I think var would work:

var dog1 = new Dog("Sam"); // var is Dog! (the compiler will keep things as 'tight' as possible unless we tell it otherwise).
var! dog2 = new Dog("Sam"); // var is Dog!
var? dog3 = new Dog("Sam"); // var is Dog?
var dog4 = (Dog)new Dog("Sam"); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningMandatoryRef(); // var is Dog!
var! dog2 = MethodReturningMandatoryRef(); // var is Dog!
var? dog3 = MethodReturningMandatoryRef(); // var is Dog? (see conversion rules)
var dog4 = (Dog)MethodReturningMandatoryRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningNullableRef(); // var is Dog?
var! dog2 = MethodReturningNullableRef(); // Compiler Error (see conversion rules)
var? dog3 = MethodReturningNullableRef(); // var is Dog?
var dog4 = (Dog)MethodReturningNullableRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningGeneralRef(); // var is Dog
var! dog2 = MethodReturningGeneralRef(); // Compiler Error (see conversion rules)
var? dog3 = (Dog)MethodReturningGeneralRef(); // var is Dog? (see conversion rules - needs cast)

The first case in each group would be clearer if we had a suffix to indicate a general reference (say #), rather than having no suffix due to the need for backwards compatibility. This would make it clear that 'var#' would be a general reference whereas 'var' can be mandatory, nullable or general depending on the context.

12. More Cases

In the process of thinking through this idea as thoroughly as possible, I have come up with some other cases that are mostly variations on what is presented above, and which would just have cluttered up this post if I had put them all in. I'll put these in a separate post in case anyone is keen enough to read them.

Neil65 commented 9 years ago

Follow-on Post

1. Introduction

This follows on from my previous post, which contained the main body of the proposal. This post lists some other cases which are mostly variations on what is presented in the original post, and which would just have cluttered up the original post post if I had put them all in. Section numbering is not contiguous because it matches the numbering for the equivalent topics in the original post.

3. Mandatory References

Should an uninitialised mandatory reference trigger an error? No, because there are situations where you need more complex initialisation. But the reference can't be used until it is initialised.

Dog! mandatoryDog; // OK, but the compiler is keeping a close eye on you. It wants the variable initialised asap.

mandatoryDog.Bark(); // Compiler Error - you can't do anything with the reference until it is initialised.
anotherMandatoryDog = mandatoryDog; // Compiler Error - you can't do anything with the reference until it is initialised.

// There is some complexity in how the variable is initialised (which is why it wasn't initialised when it was declared).
if (getNameFromFile)
{
    using (var stream = new StreamReader("DogName.txt"))
    {
        string name = stream.ReadLine();
        mandatoryDog = new Dog(name);
    }
}
else
{
    mandatoryDog = new Dog("Mandatory");
}

mandatoryDog.Bark(); // OK - compiler knows that the reference has definitely been initialised

See also the Constructors section of my original post which attempts to address similar issues in the context of constructors.

6. Using Nullable References

The original post showed how to use an 'if' / 'else' statement to apply a null check to a nullable reference so that the compiler would let us use that reference inside the 'if' block. Note that when you are in the 'else' block, there is no point actually using the nullable reference because you know it is null in this context. You might as just use the constant 'null' as this is clearer. I would like to see this as a compiler error:

if (nullableDog != null)
{
    // Can do stuff here with nullableDog
}
else
{
    Dog? myNullableDog1 = nullableDog; // Compiler Error - it is pointless and misleading to use the variable when it is definitely null.
    Dog? myNullableDog2 = null; // OK - achieves the same thing but is clearer.
}

Note that even though the same reasoning applies to traditional (general) references, we can't enforce this rule or we would break existing code:

if (generalDog != null)
{
    // Can do stuff here with generalDog (actually we can do stuff anywhere because it is a general reference).
}
else
{
    Dog myGeneralDog1 = generalDog; // OK - otherwise would break existing code.
}

Now, here are some common variations on the use of the 'if' / 'else' statement that the compiler recognises:

Firstly, you do not have to have the 'else' block if you don't need to handle the null case:

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

Also you can check for null rather than non-null:

if (nullableDog == null)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

You can also have 'else if' blocks in which case the reference behaves the same in each 'else if' block as it would in a plain 'else' block:

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else if (someOtherCondition)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}

You can also have 'else if' with a check for null rather than a check for non-null:

if (nullableDog == null)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else if (someOtherCondition)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

You can also have additional conditions in the 'if' statement ('AND' or 'OR'):

if (nullableDog != null && thereIsSomethingToBarkAt)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
}
else
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (we don't know whether it is null or not, as we don't know which condition made us reach here).
    Dog? myNullableDog = nullableDog; // OK - unlike the example at the top of this section, it does make sense to use the myDog reference here because it could be non-null.
}
if (nullableDog != null || someOtherCondition)
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (we don't know whether it is null or not, as we don't know which condition made us reach here).
    Dog? myNullableDog = nullableDog; // OK - unlike the example at the top of this section, it does make sense to use the myDog reference here because it could be non-null.
}
else
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (in fact we know for certain it is null).
    Dog? myNullableDog = nullableDog; // Compiler Error - as in the example at the top of this section, it doesn't make sense to use the myDog reference here because we know it is null. 
}

You can also have multiple checks in the same 'if' statement:

if (nullableDog1 != null && nullableDog2 != null)
{
    nullableDog1.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
    nullableDog2.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
}

Note that when you are in the context of a null check, you can do anything with your nullable reference that you would be able to do with a mandatory reference (not only accessing methods and properties, but anything else that a mandatory reference can do):

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
    Dog! mandatoryDog = nullableDog; // OK - the reference behaves like a mandatory reference.
}

On a slightly different note - we have established that we can use the following language features to allow a nullable reference to be dereferenced:

string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK

But it is pointless to apply these constructs to a mandatory reference, so the following will generate compiler errors:

string name3 = (mandatoryDog != null ? mandatoryDog.Name : null); // Compiler Error - it is a mandatory reference so it can't be null.
string name4 = mandatoryDog?.Name; // Compiler Error - it is a mandatory reference so it can't be null.

In fact a mandatory reference cannot be compared to null in any circumstances.

9. Class Libraries

What about if you have an existing assembly, compiled with an older version of the C# compiler, and you want it to use a class library which has new style references? There should be no issue here as the older compiler will not look at the new property on ParameterInfo (because it doesn't even know that the new property exists), and in a state of blissful ignorance will treat the library as if it only had traditional (general) references.

On another note, in order to facilitate rapid adoption of the new style references an attribute like this could be introduced:

[assembly: IgnoreNewStyleReferencesInternally]

This would mean that the ParameterInfo properties would be generated, but the new style references would be ignored internally within the library. This would mean that the library writers could get a version of their library with the new style references to market more rapidly. The code within the library would of course not be null reference safe, but would be no less safe than it already was. They could then make their library null safe internally for a later release.

erik-kallen commented 9 years ago

This: http://twistedoakstudios.com/blog/Post330_non-nullable-types-vs-c-fixing-the-billion-dollar-mistake is also interesting reading.

Miista commented 9 years ago

I'm all for this. However in example 3 where you declare a mandatory reference but then do not initialise it. Wouldn't it be better to require a mandatory reference to be initialised the moment it is declared. Kind of like the way that Kotlin does it.

Neil65 commented 9 years ago

Hi Miista, I hadn't heard of Kotlin before, but having now read its documentation at http://kotlinlang.org/docs/reference/null-safety.html, I realise that I have (unintentionally) pretty much stolen its null safety paradigm :-)

Regarding initialisation, I have tried to allow programmers a bit of flexibility to do the sort of initialisation that cannot be done on a single line. It would be possible to be stricter and say that if they do want to to this they have to wrap their initialisation code in a method:

Dog! mandatoryDog = MyInitialisationMethod(); // The method does all the complex stuff and returns a mandatory reference

This may be seen as being too dictatorial about coding style - but it's something worthy of discussion.

Neil65 commented 9 years ago

Having read the article by Craig Gidney (http://twistedoakstudios.com/blog/Post330_non-nullable-types-vs-c-fixing-the-billion-dollar-mistake), I now realise that I was on the wrong track saying that "the different types of references are not different 'types' in the way that int and int? are different types". I have amended my original post to remove this statement and I also re-wrote the section on 'var' due to this realisation.

Neil65 commented 9 years ago

By the way you can vote for my proposal on UserVoice if you want: https://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/7049944-consider-my-detailed-proposal-for-non-nullable-ref

As well as voting for my specific proposal you can also vote for the general idea of adding non-nullable references (this has quite a lot of votes): https://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2320188-add-non-nullable-reference-types-in-c

Neil65 commented 9 years ago

Craig Gidney's article (mentioned above) raises the very valid question - what is the compiler meant to do when asked to create an array of mandatory references?

var nullableDogs = new Dog?[10]; // OK.
var mandatoryDogs = new Dog![10]; // Not OK - what does the compiler initially fill the array with?

He explains: "The fundamental problem here is an assumption deeply ingrained into C#: the assumption that every type has a default value".

This problem can be dealt with using the same principle that has been used previously in this proposal - teaching the compiler to detect a finite list of clear and intuitive 'null-safe' code structures, and having the compiler generate a compiler error if the programmer steps outside that list.

So what would the list look like in this situation?

Obviously the compiler will be happy if the array is declared and populated on the same line (as long as no elements are set to null):

Dog![] dogs1 = { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.
Dog![] dogs2 = { new Dog("Ben"), null, null }; // Compiler Error - nulls not allowed.

The following syntax variations are also ok:

var dogs3 = new Dog![] { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.
Dog![] dogs4 = new [] { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.

The compiler will also be happy if we populate the array using a loop, but the loop must be of the exact structure shown below (because the compiler needs to know at compile time that all elements will be populated):

int numberOfDogs = 3;
Dog![] dogs5 = new Dog[numberOfDogs];
for (int i = 0; i < dogs5.Length; i++)
{
    dogs5[i] = new Dog("Dog " + i);
}

The compiler won't let you use the array in between the declaration and the loop:

int numberOfDogs = 3;
Dog![] dogs5 = new Dog[numberOfDogs];
Dog![] myDogs = dogs5; // Compiler Error - cannot use the array in any way.
for (int i = 0; i < dogs5.Length; i++)
{
    dogs5[i] = new Dog("Dog " + i);
}

The compiler will also be ok if we copy from an existing array of mandatory references:

Dog![] dogs6 = new Dog[numberOfDogs];
Array.Copy(dogs5, dogs6, dogs6.Length);

Similarly to the previous case, the array cannot be used in between being declared and being populated. Also note that the above code could throw an exception if the source array is not long enough, but this has nothing to do with the subject mandatory references.

The compiler will also allow us to clone an existing array of mandatory references:

Dog![] dogs7 = (Dog![])dogs6.Clone();

This seems to me like a reasonable list of recognised safe code structures but people may be able to think of others.

MadsTorgersen commented 9 years ago

It is great to see some thinking on non-nullable and safely nullable reference types. This gist is another take on it - adding only non-nullable reference types, not safely nullable ones.

Not only Kotlin but also Swift have approaches to this. Of course, many functional languages, such as F#, don't even have the issue in the first place. Indeed their approach of using T (never nullable) and Option (where the T can only be gotten at through a matching operation that checks for null) is probably the best inspiration we can get for how to address the problem.

I want to point out a few difficulties and possible solutions.

Guards and mutability

The proposal above uses "guards" to establish non-nullness; i.e. it recognizes checks for null and remembers that a given variable was not null. This does have benefits, such as relying on existing language constructs, but it also has limitations. First of all, variables are mutable by default, and in order to trust that they don't change between the null check and the access the compiler would need to also make sure the variable isn't assigned to. That is only really feasible to do for local variables, so for anything more complex, say a field access (this.myDog) the value would need to first be captured in a local before the test in order for it to be recognized.

I think a better approach is to follow the functional languages and use simple matching techniques that test and simultaneously introduce a new variable, guaranteed to contain a non-null value. Following the syntax proposed in #206, something like:

if (o is Dog! d) { ... d.Bark(); ... }

Default values

The fact that every type has a default value is really fundamental to the CLR, and it is an uphill battle to deal with that. Eric Lipperts blog post points to some surprisingly deep issues around ensuring that a field is always definitely assigned. But the real kicker is arrays. How do you ensure that the contents of an array are never observed before they are assigned? You can look for code patterns, as proposed above. But it will be too restrictive.

Say I'm building a List<T> type wrapping a T[]. Say it gets instantiated with Dog!. The discipline of the methods on List<T> will likely ensure that no element in the array is ever accessed without having been assigned to at some point earlier. But no reasonable compile time analysis can ensure this.

Say that the same List<T> type has a TryGet method with an out parameter out T value. The method needs to definitely assign the out parameter. What value should it assign to value when T is Dog!?

One option here is to just not allow arrays of non-nullable reference types - people will have to use Dog[] instead of Dog![] and just check every time they get a value. Similarly, maybe we just shouldn't allow List<Dog!>. After all, even unconstrained type parameters allow default(T) and T[] today. Or we need to come up with an "anti-constraint" that is even more permissive than no constraint, where you can say that you take all type arguments - even nullable reference types.

Library compatibility

Think of a library method

public string GetName(Dog d);

In the light of this feature you might want to edit the API. It may throw on a null argument, so you want to annotate the parameter with a !:

public string GetName(Dog! d);

Depending on your conversion rules, this may or may not be a breaking change for a consumer of the libarary:

Dog dog = ...;
var name = GetName(dog);

If we use the "safe" rule that Dog doesn't implicitly convert to Dog!, this code will now turn into an error. The potential for that break would in turn mean that a responsible API owner would not be able to strengthen their API in this way, which is a shame.

Instead we could consider allowing an implicit conversion from Dog to Dog!. After all Dog is the "unsafe" type already, and when you can implicitly access members of it at the risk of a runtime exception, maybe you should be allowed to implicitly convert it to a non-nullable reference type at the risk of a runtime exception?

On the other end of the API there's also a problem. Assume that the method never returns null, it should be completely safe to add ! to the return type, right?

Not quite. Notice that the consumer stores the result in a var name. Does that now infer name to be string! instead of string? That would be breaking for the subsequent line I didn't tell you about, that does name = null;. Again, we may have to consider suboptimal rules for type inference in order to protect against breaks.

Miista commented 9 years ago

The way I see it Dog! dog = ...; simply means that dog can never be null. This doesn't mean that the fields of dog cannot be null. In other words public string GetName(Dog! d); can still return a string that is null. Like so:

    public string GetName(Dog! dog) { ... }

    Dog! dog = ...;
    var name = GetName(dog); // May return null

If you wanted to return a non-nullable string you would have to say that GetName returns string! instead.

    public string! GetName(Dog! dog) { ... }

    Dog! dog = ...;
    var name = GetName(dog); // Will never return null

The non-nullability in the last example could even be enforced by the compiler (to some lengths – there may be some edge cases I can't think of).

Miista commented 9 years ago

In order to maintain backwards compatibility I believe the type should be inferred to the loosest (may be the wrong term) possible scope e.g. public string GetName(Dog! dog) would be inferred to string and public string! GetName(Dog! dog) would be inferred to string!.

Trying to set a non-nullable reference to null should not compile.

dotnetchris commented 9 years ago

Yes yes yes, my god yes.

The billion dollar mistake infuriates me. It's absolutely insane that references allow null by default, since i know c# will never be willing to fix the billion dollar mistake this is atleast a viable alternative. And removes the need to use the stupid ?. operator

gafter commented 9 years ago

The "billion dollar mistake" was having a type system with null at all. This would not fix that, it just makes it slightly less painful.

dotnetchris commented 9 years ago

@gafter what I want most is for C# to drop nulls entirely unless a reference is specifically marked nullable, but i know that will never happen

gafter commented 9 years ago

@dotnetchris There is no way to shoehorn that into the existing IL or C# semantics.

HaloFour commented 9 years ago

I think there is value in stepping back and watching the Swift and Obj-C communities battle it out over this issue. Apparently despite the slick appearance of optionals in Swift it creates a number of severe nuisance scenarios, particularly in writing the initialization of a class:

Swift Initialization and the Pain of Optionals

My concern has always been that without null you end up with sentinels which, in the reference world, often make absolutely no sense. Sure, the developers can further declare their intent but then everyone is required to do the additional dance to unwrap the optional. Compiler heuristics could help there but I'm sure that there will always be those corner cases.

Ultimately, in my opinion, non-nullable references feels a little like Java checked exceptions. Sure, it seems great on paper, and even better with perfectly idiomatic examples, but it also creates obnoxious barriers in practice which encourage developers to take the easy/lazy way out thus defeating the entire purpose. It feels like erecting guard-rails along a hairpin curve on the edge of a cliff. Sure, the purpose is safety, but perceived safety can encourage recklessness, and I think that developers should be learning how to code more defensively (not just for simple validation but to also never trust your inputs) not assuming that someone else will relieve them of that burden.

Just a devil's advocate rebuttal by someone who would probably welcome them to the language if done well. :smile:

dotnetchris commented 9 years ago

@HaloFour checked exceptions was the only positive statement I ever have to say about Java, other than ICloneable actually being you know, cloning.

paulomorgado commented 9 years ago

I really can't understand what the problem is about null!

Look at a string as a box of char. I can wither have a box or not (null). If I have a box, it can either be empty ("") or not ("something").

I don't know that much about F# but I can't see Option doing anything better here. It's still about wither having a box or not. But what guarantee does F# give me that I still have a box on the table just because when I asked before there was one?

Sure it's a pain to have to be lookng out for null all the time or be bitten when not doing that, but hiding the problem is not solving it.

The ?: operator introduced in C#6 solves a lot of issues and the proposed pattern matching for C#7 (#206) will solve a lot of others.

@Neil65

So far, the compiler does everything it can to generate code that will behave as intended at run time. For that, it relies on the CLR.

What you are proposing goes more on the way of "looks good on paper, hope it goes well at run time.".

Having the compiler yield warnings just because your intention is not verifiable, even at compile time, is a very bad idea. Compiler warnings should not be yield for something that the developer cannot do anything about.

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).

Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .

The cast should be either possible or not.

Regarding var, why do mandatory and nullable references need to qualify var when the same is not need for nulable value types?

Is Dog![] an mandatory array of Dog or an array of mandatory Dog?

dotnetchris commented 9 years ago

@paulomorgado what it fundamentally boils down is I as the author of code should have the authority to determine whether null is valid or absolutely invalid. If I state this code may never allow null, the compiler should within all reason attempt to enforce this statically.

While the ?. operator is something, it still doesn't eliminate throw if(x==null) because there is no valid response ever. Null reference exceptions are likely the number one unhandled exception of all time both in .NET and programming as a whole. Compiler enforced static analysis would easily prevent this problem from existing.

It being labeled "The Billion Dollar mistake" is not hyperbole, I actually expect it to have cost multiple billions if not tens of billions at this point.

paulomorgado commented 9 years ago

What costs billions of dollars are bad programmers and bad practices, not null by itself.

The great thing with null reference exceptions is that you can always point where it was and fix it.

Changing the compiler without having runtime guarantees will be just fooling yourself and, when bitten by the mistake, you might not be able to know where it is or fix it.

Sure I'd like to have non nullable reference types as much as I wanted nullable value types. But I want it done right. And I don't see how that will ever be possible without changing the runtime, like it happened with nullable value types.

Neil65 commented 9 years ago

Thanks everyone for engaging in discussion on this topic. I have some responses to what people have said but I haven't had time yet to write them down due to working day and night to meet a deadline. I'll try and post something over the weekend.

GeirGrusom commented 9 years ago

On Generics:

I don't think generics should be allowed to be invoked with non-nullable references unless there is a constraint on the generic method.

public static void InvalidGeneric<T>(out T result) { result = default(T); }

public static void OkGeneric<T>(out T result) where T : class! { result = Arbitrary(); }

public static void Bar()
{
    string! input;
    InvalidGeneric(out input); // Illegal as it would return a mandatory with a null reference
    OkGeneric(out input); // OK.
}
ssylvan commented 9 years ago

IMO converting a mandatory reference to a "weaker" one should always be allowed and implicit. I.e. if a function takes a nullable reference as an argument you should be able to pass in a mandatory reference. Same if the function takes a legacy reference. You're not losing anything here, the code is expecting weaker guarantees than you can provide. If your code works with a nullable or general reference, then clearly it wouldn't break if I pass it a mandatory reference (it will just always go down any non-null checks inside).

I also think nullable to general and vice versa should be allowed. They're effectively the same except for compile time vs runtime checking. So dealing with older code would be painful if you couldn't use good practices (nullable references) in YOUR code without having to make the interface to legacy code ugly. Chances are people will just keep using general references if you add a bunch of friction to that. Make it easy and encouraged to move to safer styles little by little, IMO.

This last case may warrant a warning ("this will turn compile time check into runtime check"). The first two cases (mandatory/nullable implicitly converts to general) seems like perfectly reasonable code that you would encourage people to do in order to transition to a safer style. You don't want to penalize that.

MikeyBurkman commented 9 years ago

@paulomorgado As much as I hate to bring Java up, its lack of reified types means that generic type information is not around at runtime, and yet in 10 years I've never once accidentally added an integer to a list of strings. (Don't get me wrong, not having reified types causes other issues, usually around reflection, but reflection can cause all sorts of bad things if you don't know what you're doing.)

While runtime checking may sound like a good sanity check, it comes at a cost, and it's by no means required to make your system verifiably correct. (Assuming of course you aren't hacking the innards of your classes through reflection.)

Re: empty string vs non empty string: Those are two different types, and should be treated as such. You couldn't do any compile-time verification that you didn't pass an empty string to the constructor of NonEmptyString, but you'd at least catch it at the place of instantiation, rather than later classes doing the check and making it difficult to trace back to where the illegal value originated. The same theory goes for converting nullable types to non-null types.

By the way, Ceylon does something very similar to this proposal. Might be worthwhile looking at them.

kalleguld commented 9 years ago

Is the var handling backwards compatible? Seems like this would be valid c# code now, but shouldn't compile with the proposed rules.

var dog1 = new Dog("Sam"); //dog1 is Dog!
dog1 = null; //dog1 cannot be null
olmobrutall commented 9 years ago

Nice that the TOP 1 C# requested feature, Non-Nullable reference types, is alive again. We discussed deeply some months ago about the topic.

It's a hard topic, with many different alternatives and implications. Consequently is easier to write a comment with a naive proposal than read and understand the other solutions already proposed.

In order to work in the same direction, I think is important to share a common basis about the problem.

On top of the current proposal, I think this links are important.

Back to the topic. I think the concept explained here lacks a solution for two hard problems:

Generics

It's explained how to use the feature in a generic type, (uisng T? inside a List<T>) but the most important problem to solve is how to let the client code use the feature when using arbitrary generic types (List<string!>).

This problem is really important to solve because generics are often used for collections, and 99% of the cases you don't want nulls in your collection.

It's also challenging because is a must that it works transparently on non-nullable references and nullable value types, even if they are implemented in a completely different way at the binary level. We already have many collections to chose (List<T>, Collection<T>, ReadonlyCollection<T>, InmutableCollection<T>...) to multiply the options for the different constraints in the type (ListValue<T>, ListReference<T>, ListNonNullReference<T>).

I think unifying the type system is really important, but this has the consequences that Nullable<T> should allow nesting and class references, and making string? mean something a nullable refenre string with a HasValue boolean.

Library compatibility

This solutions focuses in local variables, but the challenge is in integrating with other libraries, legacy or not, written in C# or other languages.

It's important that the tooling is strong in this cases, and that safety is preserved. Unfortunately this requires run-time checks.

Also, is important that library writers (including BCL) can chose to use the features without fear of undesired consequences for their client code. I propose three compilation flags: strict, transitory and legacy (similar to /warnaserror) . This allows teams to graduate how string they want to work.

As Lucian made me see, this solution is better than branching C# in two different languages: One where string is nullable (legacy) and one where string is not-nullable (string) with a #pragma option or something like this. (similar to OPTION STRING in Visual Basic).

danieljohnson2 commented 9 years ago

It seems to me that mandatory types could be useful even if they weren't statically checked at all. After all, today's .NET does null checking at runtime. It throws exceptions, rather than exhibiting undefined behavior.

For a variable of mandatory reference type, this runtime check could be done sooner. Assigning null to such a variable might throw an exception. Reading from it (if null) might do the same, even if you are only copying the reference to another mandatory variable.

An array of mandatory references would indeed be created full of nulls, but you could not see them- you'd get exceptions if you try. This checking would have a runtime cost, but it would detect errors earlier, and document the programmer's intent better than what we have now.

I don't think the runtime cost is a big worry- this must all be opt-in for compatibility reasons anyway.

If we are saying that a field or array element shouldn't be null, we want to know if it actually is null as soon as possible. If it can't be as soon as compile time, it can still be sooner than it is today.

Of course, you could still have static checking on top of this, too!

What you can't have is erasure. You'd have to have Object! and Object as really different types. This might be done in with a wrapper struct like Nullable<T>, but it's not going to be easy to retrofit existing libraries with that, and without CLR changes it will have unpleasant corner cases.

vkhorikov commented 9 years ago

@Neil65 An important note: .Net team should rewrite most (if not all) of the libraries using this approach so that developers could take the full advantage of using this new feature. Just like it was with async/await one.

HaloFour commented 9 years ago

Just to reference the more recent design notes on this subject:

1648 C# Design Meeting Notes for Mar 10 and 17, 2015

1677 C# Design Notes for Mar 18, 2015

2119 C# Design Notes for Apr 1 and 8, 2015

It seems that the direction is looking like attribute-based annotation of parameters with analyzers (built-in or otherwise) used to track how the variable is used in order to ensure that it shouldn't be null. It's unlikely that the compiler could ever properly guarantee non-nullability for anything being passed from an outside source, particularly in containers. If given a syntax to express non-nullability it would probably be possible to designate type parameters as non-nullable in the same manner that they are currently designated as dynamic, e.g. Dictionary<String, dynamic> -> [Dynamic(false, false, true)] Dictionary<String, object>.

Of course this is just my summarization of the situation for the loads of new folks contributing here today. I could be (and probably am) wrong on some of the details, which are likely in flux anyway.

Note that doing non-nullability right cannot be accomplished by Roslyn itself. It would involve fairly invasive changes to the underlying runtime. Those suggestions should probably be taken up with CoreCLR.

NightElfik commented 9 years ago

This is awesome proposal! Does this mean that normal types should be obsolete because everything should be either mandatory reference or nullable reference?

Also, if compiler knows that mandatory reference is not null, will automatic null checks on access be omitted too? This could bring some speed improvements in certain scenarios.

tec-goblin commented 9 years ago

It would be very interesting to see how this interacts with the [Required] DataAnnotation. A field marked with ! should behave like having a [Required] with the default message. The inverse isn't necessarily true, because an Entity might not have to be valid all the time, and in some scenarios it is normal for a [Required] member to be temporarily null. This is an important issue, because, particularly in web scenarios, about half of the objects we operate on are Entities, and the transition to using this new feature should be fluent, without too many casts.

It also highlights issues with the constructor proposition. It first imposes a specific way to initialise the entities to all ORMs who have to respect this notation, and secondly, it breaks the common way of initialising entities in code (new Entity{MyReference = something}), because the compiler interprets the above as a a = new Entity(); a.MyReference = something; thus this MyReference is temporarily null. I would suggest thus that the compiler treats the above notation differently in the context of non-nullable references.

jibal commented 9 years ago

"I really can't understand what the problem is about null!"

Inability to understand is not a virtue.

"What costs billions of dollars are bad programmers and bad practices, not null by itself."

No, the cost is due to the conjunction of null and humans. Humans can't be eliminated but null can be through properly designed type systems. Early C compilers didn't even enforce the distinction between ints and pointers; they could be assigned back and forth without casts. This led to many many bugs and crashes. Of course all of these could have been eliminated by avoiding the "bad practices" of writing code with bugs in it ... but this is a view that is deeply ignorant of type systems, humans, costs, and the whole software development enterprise.

"By the way, Ceylon does something very similar to this proposal. Might be worthwhile looking at them."

Indeed ... Ceylon, not Swift or Kotlin, is the most important language to look at in this regard. Unfortunately, C# or any other language that tacks this stuff on as an afterthought will look bad in comparison to a language like Ceylon where it was designed in from the beginning.

dmitriyse commented 9 years ago

I have spent some time researching non nullable reference types and i remember the CodeContracts ugly implementation. I have found more general variant of solution (may be just way to think to) for both problems: 1) nullability and 2) contracts. Imagine, that we have new compiler-only construct:

public constraint class NonZeroLengthString: String
{
    public bool InvariantConstraints()
    {
         return this != null && this.Length > 0; // Analyzed by compile-time inference engine..
    }
}

then we can define C# 7.0 signature:

public delegate NonZerLengthString DoSomething(object obj);

It is compiled to CLR code:

[return: CS7Constraint["NonZeroLengthString"]]
public delegate  string DoSomething(object obj);

More examples:

public constraint class NonNullReference<T>: class T
{
    public bool IvariantConstraints()
    {
            return this != null;
    }
}

public class Foo
{
      public  Foo(NonNullReference<string> arg1)
      {
      }
      // The same signature, but with C# 7.0 syntax sugar.
      public void Bar(string! arg1)
      {
      }
}

// C# 7.0 strict mode (all reference types are non nullable by default)
strict public class Foo
{
      public  Foo(string arg1, string? canBeNullVariable)
      {
      }
}

This approach may be a first step on injecting powerful static code check and language built-in inference engine. All variables can be fulfilled with additional semantic and language inference engine can help to protect developers from their mistakes.

In short words this approach can be called: Variables semantic markup (semantic tracking)

kumpera commented 9 years ago

Having this information available to the VM and make it part of the verification model would be greatly helpful when AOT compiling to targets that don't allow hardware traps for NREs.

pchalamet commented 9 years ago

Strange proposal. Null is not a problem by itself. This should be implemented at low level, down in the bowels of CLR as a new reference type. Letting the compiler guessing if a non-null reference is not null is stupid. I have bunch of code using extension methods to check for null that would not be detected by compiler.

Please implement this in the CLR and just expose ! reference in language. Compatible with legacy code and ensure smooth migration to safer constructs.

danieljohnson2 commented 9 years ago

The problem with pushing this down into the CLR is that it then becomes impossible to upgrade the base class library to use mandatory types without breaking every existing program.

This is why erasure is so desirable- if 'nullability' is compiled down into attributes and runtime checks, then existing code can smoothly ignore it and still work just fine. But then you need runtime checks to protect against such code injecting nulls into your non-nullable stuff. Maybe you only need checks at places visible outside the assembly, but you do need some checks.

You could manage this for parameters, locals, and return values without a lot of trouble. Things start to get dicey for fields of classes, and are pretty much impossible for array elements and structure fields.

I think a worthwhile feature is possible within those limits; you can probably get away will allowing non-nullable private or internal fields, but not public fields. Properties can include implicit null checks, so you'd use those instead of public fields.

Finalizers still break this, but I think that's tolerable.

danieljohnson2 commented 9 years ago

I hate to reply to myself, but it has occurred to me that the finalizer problem I mentioned (described at http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types) is actually fixable.

You would need to avoid registering objects for finalization until all the field initializers have run, which is to say, after we've chained up to Object.ctor(). Once we get there, all bets are off and all mandatory fields must have been initialized anyway, because we're about to run user-defined constructors, which can in turn call virtual methods.

The compiler could inject GC.SuppressFinalize() at the top of each constructor, before any field initialization, and GC.ReRegisterForFinalize() after the base class constructor returns. All constructors would have to do this, just in case a base class has a mandatory field, or a subclass introduces a finalizer.

Actual CLR support for this would be very desirable; you'd have to do an awful lot of suppress/reregister pairs without it.

But this way the finalizer could only run if the field initializers had completed (without throwing). That is not quite 100% compatible with C# 6 semantics, but it is very close, and a finalizer could never see a null in a non-nullable field.

HaloFour commented 9 years ago

@danieljohnson2

That won't help. Finalization is less of a problem due to incomplete initialization and more of a problem due to the fact that you're in the middle of a collection and the odds are quite high that one or more of the references contained in those fields have already been collected and as such will be null.

danieljohnson2 commented 9 years ago

@HaloFour

I don't think that's right; my understanding is that as long as the object is being finalized, it's still live and everything reachable from it is live too; none of that stuff is actually collected until the next GC cycle, and even then only if you didn't resurrect it (by saving a reference to it somewhere).

HaloFour commented 9 years ago

@danieljohnson2 I swear I've been bitten by this before but I could be mis-remembering.

Thorarin commented 9 years ago

For symmetry, I think that if the language is going to support nullable reference types (for which the compiler throws an error if no null check is being done), there should be an equivalent compiler error for nullable value types accessing the Value property without checking HasValue in some way as well.

Unfortunately, for compatibility there should be a newly introduced syntax to express that as well, which will never look completely symmetrical because "?" is already used to merely declare a Nullable<T>. Perhaps a double question mark would do, because that isn't currently valid syntax in C# 5/6. That is, outside the use as null-coalescing operator of course, but ? is also used for the conditional operator and I cannot think of any ambiguities.

Neil65 commented 9 years ago

As the original author of this thread I would like to respond to some points in the post by @MadsTorgersen on 12 February (yes I know that was a long time ago now!!). I am aware that there have been design meetings and prototyping since then, and that Mads may not necessarily hold exactly the same views as he did in February, but what I have to say makes the discussion in this thread a bit more complete and is also relevant to the "flow-based nullability checking" that the design team has been looking into.

1. Mutability of variables

Mads stated that "variables are mutable by default, and in order to trust that they don't change between the null check and the access the compiler would need to also make sure the variable isn't assigned to".

I don't think it is actually a problem if the variable is assigned to. This is because of the following principles from my proposal (which is at the start of this thread):

This means that even if a mandatory reference changes its value, its new value is still guaranteed to be non-null; therefore in terms of null safety the fact that it changed is not a problem.

In other words the compiler will stop any 'dangerous' assignments:

if (myDog != null) // myDog is nullable but once it has passed the null check it behaves as madatory.
{
    myDog = nullableDog; // Compiler error - cannot assign nullable to mandatory.
    myDog.Bark();
}

But this assignment is ok:

if (myDog != null) // myDog is nullable but once it has passed the null check it behaves as madatory.
{
    myDog = mandatoryDog; // OK.
    myDog.Bark(); // myDog is now a different dog, but it is still guaranteed to be non-null.
}

2. Field access

Mads stated that "for anything more complex, say a field access (this.myDog) the value would need to first be captured in a local before the test in order for it to be recognized."

There is a valid concern here: in between the null check and the use of the variable there might be a method call that has the potential to change the value of the field to null.

Capturing the value in a local seems to solve one problem and create another. It solves the 'null' problem by capturing the value of the field before other code has the potential to change it to null. However it is possible that this other code might, by design, change the field to a different non-null value. Hence capturing the value of the field in a local variable could actually change the functional behaviour of the code by effectively bypassing the assignment of the new value.

Say we have the following class:

public class Person
{
    public Dog? { get; set; } // For the purposes of this exercise assume that a person owns either 0 or 1 dogs.
    public void ChangePetOwnership() { } // Could involve buying, selling or swapping a dog.
}

Say we have the following code, which is not null-safe:

if (person.Dog != null)
{
    person.ChangePetOwnership(); // The person could sell their dog.
    person.Dog.Bark(); // Not null-safe.
}

(Note that the programmer has chosen to put the method call inside the null check instead of before it, so presumably the business requirement is that you don't change pet ownership unless you already have a dog.)

If we use a local variable, we make the code null-safe but change the functionality:

Dog! dog = person.Dog;
if (dog != null)
{
    person.ChangePetOwnership(); // The person could swap their dog.
    person.Dog.Bark(); // Null-safe, but the logic of the code is changed as it is the old dog doing the barking.
}

The only way to make the code null safe while preserving the original functionality is to put in a second null check:

if (person.Dog != null) // This null check preserves the original functionality.
{
    person.ChangePetOwnership();
    if (person.Dog != null) // This null check makes the code null-safe.
    {
        person.Dog.Bark();
    }
}

The implication of all this is that the compiler must not allow method calls such as this in between the null check and the use of the field value. (It must also not allow any method call into which person is passed as a parameter). This may result in the programmer being compelled to put in an additional null check.

3. Syntax for null checking

Mads suggested the following syntax (in line with the pattern matching proposal):

if (myDog is Dog! myDogVal)
{
    myDogVal.Bark();
}

I would make the following points about this:

  1. This differs from my proposal in that it introduces a temporary variable, but as discussed above this doesn't seem to offer much advantage in terms of what has to be done to make code null-safe.
  2. Therefore this choice of syntax is mostly just an issue of mechanics and usability. The big, problematic areas that we have to grapple with will exist no matter what syntax is used; thus many parts of my proposal (e.g. assignment rules, var, class library considerations) are relevant in either case.
  3. Having said that, usability considerations of the syntax are still important. I see the following advantages for each option:
    • The pattern-matching syntax is very explicit; it is immediately obvious that a null-safe scope has been created.
    • The use of existing language features provides a lot of flexibility; in my proposal there are six different well-defined ways of providing a null-safe scope:
      • if/else
      • if null return
      • if null throw exception
      • conditional operator (?:)
      • null-conditional operators (?.) and (?[])
      • null-coalescing operator (??)

Although I do like the flexibility of using existing language features, I would be extremely happy to see either syntax option make it into the C# langauage.

It would be possible to use the pattern-matching syntax while also allowing the use of the null-conditional and null-coalescing operators, as shown below (in fact it would be inconsistent not to allow this since the use of these operators is already allowed for nullable value types).

string name = nullableDog?.Name;
var nullableDog = nullableArrayOfDogs?[0];

var mandatoryDog = nullableDog ?? new Dog("Fred");
svick commented 9 years ago

In other words the compiler will stop any 'dangerous' assignments:

if (myDog != null) // myDog is nullable but once it has passed the null check it behaves as madatory.
{
    myDog = nullableDog; // Compiler error - cannot assign nullable to mandatory.
    myDog.Bark();
}

I think this means that some useful coding patterns would be impossible (or would at least require some amount of fighting with the language). For example, consider manually iterating a linked list:

Node? currentNode = firstNode;

while (true)
{
    if (currentNode != null)
    {
        // process current node here
        currentNode = currentNode.NextNode; // error here, because NextNode is nullable
    }
    else
    {
        break;
    }
}
HaloFour commented 9 years ago

@svick Apart from being a breaking change that would also be quite an annoying language "feature" to have to fight. I like the idea of the compiler being able to determine that currentNode wouldn't be null immediately after the condition but to then enforce the variable to be mandatory I think goes too far.

I think that the ideal situation is if the compiler flow analysis can determine that the variable may be treated as mandatory after the conditional but that the variable then becomes suspect again after assignment. This is how null analysis works in IntelliJ (and I imagine Resharper).

For fields/properties I think variable capture is the way to go.

Neil65 commented 9 years ago

@HaloFour It wouldn't actually a breaking change because the behaviour would only apply if you use the new nullable types (i.e. with the question mark suffix). Existing code using 'general' (i.e. traditional) references would compile and run as before.

@svik It's not too hard to modify your example so that the compiler would accept it (see below). I would argue that this makes it a bit clearer because the iteration logic is separated from the node processing logic (and also the structure reflects the fact that only the node processing logic actually depends on a null check to be valid).

Node? currentNode = firstNode;

while (true)
{
    if (currentNode != null)
    {
        // process current node here
    }
    else
    {
        break;
    }

    currentNode = currentNode.NextNode; // OK because we are outside the null check.
}

Regarding the 'annoyance' factor, a comparison could be made with compile time type checking - I'm more than happy for the compiler to 'annoy' me at compile time rather than keeping quiet and letting errors occur at run time. However I'm interested in seeing examples showing that there would be too much annoyance.

I'm not completely against variable capture but as I have argued in my previous post it introduces its own problems.

Neil65 commented 9 years ago

Sorry, I realised one of the points in my previous post was incorrect so I have deleted that point.

HaloFour commented 9 years ago

@Neil65 Perhaps, but why should string? behavior appreciably differently than string. Granted the ? provides an extra hint but the compiler for all intents and purposes should be treating any such reference as nullable anyway.

I think your code example is exactly the kind of syntax gymnastics I'd prefer to avoid. You have to invert your logic to make sense and it doesn't feel natural. We're already discussing these variables changing meaning by virtue of string? being treated as string! due to a conditional or other statements that would eliminate the possibility of the reference being null, why couldn't an assignment convert that back? This is something that analyzers offer today.

Granted, if pattern matching for null checking becomes the norm it's kind of a moot point.

Neil65 commented 9 years ago

Actually the logic can be either way around depending on your preference (I've changed it my post to match the order in the example by @svick).

danieljohnson2 commented 9 years ago

I may be in the minority on this, but I feel this implicit null-elimination syntax is too implicit.

Making variable types change depending on which branch of a seemingly-ordinary 'if' you are in is cute, but it'll have to be full of corner cases to maintain compatibility. We'd be better off with some new syntax that can be used with ordinary reference variables too, which will certainly want it.

I believe Swift has something like this: if let not_nullable_var = nullable_var { /* not_nullable_var in scope here */ }

You'd need a few parens to get it to work in C# where braces are optional: if let nnv = (nv) /* nnv in scope here */;

That should be unambiguous, and if the 'nv' part can be an expression you can have stuff like this: if let nnv = (nv as int?) /* nnv is an int here */;

You could have while let in an analogous way.

This way instead of having the language mysteriously not recognizing certain null tests, it has explicit rules about where the 'let' keyword can go- you'll get much clearer error messages that way.

The downside I see here is that you get a proliferation of variable names, as each if-let introduces one. But I think you were going to get quite a bit of that anyway just working around the corner cases.

gbworld commented 9 years ago

I understand the mandatory types, as it would be nice to make sure a reference type cannot be nulled out. I am also not against nullable reference types, per se, but wonder why the compiler should not just treat a reference type so it can catch issues without the question mark. Your example here:

if (nullableDog == null) { nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference. } else { nullableDog.Bark(); // OK - the reference behaves like a mandatory reference. }

Shouldn't the compiler be able to be smart enough to say "he is asking for behavior on a null object"? If the idea is to make intent explicit, I am all for it. If it is to solve this type of problem with the compiler, I think a non-mandatory, non-explicit nullable type should still cause the compiler to complain if you use the above.

lukasf commented 9 years ago

I think this is a great concept but I have two issues with it.

1. Symmetry between value types and reference types + Remove clutter

With the original proposal, we will have asymmetric behavior.

Value type: Dog - nun-nullable Dog? - nullable

Reference type: Dog - nullable Dog? - nullable Dog! - non-nullable

Plus, I agree that for transition and migration of legacy code, it is neccessary to not modify the default behavior of reference types. But I think in the long run, if using this feature, it would be desirable to have mandatory references for almost all declarations, and only a very small amount of nullable references. In that scenario, having to write "!" behind each mandatory reference declaration would add a lot of clutter.

So my proposal is:

Instead of having a compiler switch to disallow use of unspecified references (as OP proposed), a compiler switch could instead be added to treat unspecified references as mandatory. So now a declaration is always mandatory, unless a "?" is specified. Just like it is with value types!

Benefits:

For migrating code to the new behavior, the legacy behavior can be used and all "safe" declarations can be tagged with "!". Once all declarations have been explicitly specified nullable or non-nullable, then the compiler switch can be enabled, and after that, all "!" can be removed to cleanup the code.

Proposed Legacy Behavior: (same as OP) Dog - nullable Dog? - nullable Dog! non-nullable

Proposed New Behavior (after compiler switch) Dog - non-nullable Dog? - nullable Dog! - non-nullable (the "!" is redundand with the new behavior and could be omitted)

I am aware that enabling this switch would mean a breaking change to a lot of code. So maybe it would be best to always stick to the legacy behavior, and let people opt-in into the new behavior. But maybe change is good and this should be the new default, to encourage people to use the new, much safer way to code?! I don't know. But at least having the choice would definitely be great. I would definitely use it for all new projects!

Thead safety

It has been mentioned before, but this behavior will cause issues in multi threading scenarios:

if (nullableDog != null) { nullableDog.Bark(); // OK - the reference behaves like a mandatory reference. BUT this could break if nullableDog is chaged to null on a different thread!! }

It could be used on local variables, if they have not been already been ref'd to someone. But for all member variables, it will be neccessary to capture the variable into a new copy, so changes to the member on a different thread will not cause the code to break. It could be done implicitly (code inside the block would implicitly see the copy and not the original member), but I don't know if that is a good idea. A syntax to explicitly give it a new name would probably be better. As said, for locals, the compiler could be clever enough to figure out if using it is thread safe, so the simplified syntax above could be used...