dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.99k stars 4.03k forks source link

Proposal for non-nullable references (and safe nullable references) #227

Closed Neil65 closed 7 years ago

Neil65 commented 9 years ago

1. Overview

This is my concept for non-nullable references (and safe nullable references) in C#. I have tried to keep my points brief and clear so I hope you will be interested in having a look through my proposal.

I will begin with an extract from the C# Design Meeting Notes for Jan 21, 2015 (https://github.com/dotnet/roslyn/issues/98):

There's a long-standing request for non-nullable reference types, where the type system helps you ensure that a value can't be null, and therefore is safe to access. Importantly such a feature might go along well with proper safe nullable reference types, where you simply cannot access the members until you've checked for null.

This is my proposal for how this could be designed. The types of references in the language would be:

Important points about this proposal:

  1. There are no language syntax changes other than the addition of the '!' and '?' syntax when declaring (or casting) references.
  2. Null reference exceptions are impossible if the new style references are used throughout the code.
  3. There are no changes to the actual code compilation, by which I mean we are only adding compiler checks - we are not changing anything about the way that the compiled code is generated. The compiled IL code will be identical whether traditional (general) references or the new types of references are used.
  4. It follows from this last point that the runtime will not need to know anything about the new types of references. Once the code is compiled, references are references.
  5. All existing code will continue to compile, and the new types of references can interact reasonably easily with existing code.
  6. The '!' and '?' can be added to existing code and, if that existing code is 'null safe' already, the code will probably just compile and work as it is. If there are compiler errors, these will indicate where the code is not 'null safe' (or possibly where the 'null safe-ness' of the code is expressed in a way that is too obscure). The compiler errors will be able to be fixed using the same 'plain old C#' constructs that we have always used to enforce 'null safe-ness'. Conversely, code will continue to behave identically if the '!' and '?' are removed (but the code will not be protected against any future code changes that are not 'null safe').
  7. No doubt there are ideas in here that have been said by others, but I haven't seen this exact concept anywhere. However if I have reproduced someone else's concept it was not intentional! (Edit: I now realise that I have unintentionally stolen the core concept from Kotlin - see http://kotlinlang.org/docs/reference/null-safety.html).

The Design Meeting Notes cite a blog post by Eric Lippert (http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types/#.VM_yZmiUe2E) which points out some of the thorny issues that arise when considering non-nullable reference types. I respond to some of his points in this post.

Here is the Dog class that is used in the examples:

public class Dog
{
    public string Name { get; private set; }

    public Dog(string name)
    {
        Name = name;
    }

    public void Bark()
    {
    }
}

2. Background

I will add a bit of context that will hopefully make the intention of the idea clearer.

I have thought about this topic on and off over the years and my thinking has been along the lines of this type of construct (with a new 'check' keyword):

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

check (nullableDog)
{
    // This code branch is executed if the reference is non-null. The compiler will allow methods to be called and properties to be accessed.
    nullableDog.Bark(); // OK.
}
else
{
    nullableDog.Bark(); // Compiler Error - we know the reference is null in this context.
}

The 'check' keyword does two things:

  1. It checks whether the reference is null and then switches the control flow just like an 'if' statement.
  2. It signals to the compiler to apply certain rules within the code blocks that follow it (most importantly, rules about whether or not nullable references can be dereferenced).

It then occurred to me that since it is easy to achieve the first objective using the existing C# language, why invent a new syntax and/or keyword just for the sake of the second objective? We can achieve the second objective by teaching the compiler to apply its rules wherever it detects this common construct:

if (nullableDog != null)

Furthermore it occurred to me that we could extend the idea by teaching the compiler to detect other simple ways of doing null checks that already exist in the language, such as the ternary (?:) operator.

This line of thinking is developed in the explanation below.

3. Mandatory References

As the name suggests, mandatory references can never be null:

Dog! mandatoryDog = null; // Compiler Error.

However the good thing about mandatory references is that the compiler lets us dereference them (i.e. use their methods and properties) any time we want, because it knows at compile time that a null reference exception is impossible:

Dog! mandatoryDog = new Dog("Mandatory");
mandatoryDog.Bark(); // OK - can call method on mandatory reference.
string name = mandatoryDog.Name; // OK - can access property on mandatory reference.

(See my additional post for more details.)

4. Nullable References

As the name suggests, nullable references can be null:

Dog? nullableDog = null; // OK.

However the compiler will not allow us (except in circumstances described later) to dereference nullable references, as it can't guarantee that the reference won't be null at runtime:

Dog? nullableDog = new Dog("Nullable");
nullableDog.Bark(); // Compiler Error - cannot call method on nullable reference.
string name = nullableDog.Name; // Compiler Error - cannot access property on nullable reference

This may make nullable references sound pretty useless, but there are further details to follow.

5. General References

General references are the references that C# has always had. Nothing is changed about them.

Dog generalDog1 = null; // OK.
Dog generalDog2 = new Dog("General"); // OK.

generalDog.Bark(); // OK at compile time, fingers crossed at runtime.

6. Using Nullable References

So if you can't call methods or access properties on a nullable reference, what's the use of them?

Well, if you do the appropriate null reference check (I mean just an ordinary null reference check using traditional C# syntax), the compiler will detect that the reference can be safely used, and the nullable reference will then behave (within the scope of the check) as if it were a mandatory reference.

In the example below the compiler detects the null check and this affects the way that the nullable reference can be used within the 'if' block and 'else' block:

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

if (nullableDog != null)
{
    // The compiler knows that the reference cannot be null within this scope.
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    // The compiler knows that the reference is null within this scope.
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}

The compiler will also recognise this sort of null check:

if (nullableDog == null)
{
    return;
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

And this:

if (nullableDog == null)
{
    throw new Exception("Where is my dog?");
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

The compiler will also recognise when you do the null check using other language features:

string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK

Hopefully it is now clear that if the new style references are used throughout the code, null reference exceptions are actually impossible. However once the effort has been made to convert the code to the new style references, it is important to guard against the accidental use of general references, as this compromises null safety. There needs to be an attribute such as this to tell the compiler to prevent the use use of general references:

[assembly: AllowGeneralReferences(false)] // Defaults to true

This attribute could also be applied at the class level, so you could for example forbid general references for the assembly but then allow them for a class (if the class has not yet been converted to use the new style references):

[AllowGeneralReferences(true)]
public class MyClass
{
}

(See my additional post for more details.)

7. Can we develop a reasonable list of null check patterns that the compiler can recognise?

I have not listed every possible way that a developer could do a null check; there are any number of complex and obscure ways of doing it. The compiler can't be expected to handle cases like this:

if (MyMethodForCheckingNonNull(nullableDog))
{
}

However the fact that the compiler will not handle every case is a feature, not a bug. We don't want the compiler to detect every obscure type of null check construct. We want it to detect a finite list of null checking patterns that reflect clear coding practices and appropriate use of the C# language. If the programmer steps outside this list, it will be very clear to them because the compiler will not let them dereference their nullable references, and the compiler will in effect be telling them to express their intention more simply and clearly in their code.

So is it possible to develop a reasonable list of null checking constructs that the compiler can enforce? Characteristics of such a list would be:

  1. It must be possible for compiler writers to implement.
  2. It must be intuitive, i.e. a reasonable programmer should never have to even think about the list, because any sensible code will 'just work'.
  3. It must not seem arbitrary, i.e. there must not be situations where a certain null check construct is detected and another that seems just as reasonable is not detected.

I think the list of null check patterns in the previous section, combined with some variations that I am going to put in a more advanced post, is an appropriate and intuitive list. But I am interested to hear what others have to say.

Am I expecting compiler writers to perform impossible magic here? I hope not - I think that the patterns here are reasonably clear, and the logic is hopefully of the same order of difficulty as the logic in existing compiler warnings and in code checking tools such as ReSharper.

8. Converting Between Mandatory, Nullable and General References

The principles presented so far lead on to rules about conversions between the three types of references. You don't have to take in every detail of this section to get the general idea of what I'm saying - just skim over it if you want.

Let's define some references to use in the examples that follow.

Dog! myMandatoryDog = new Dog("Mandatory");
Dog? myNullableDog = new Dog("Nullable");
Dog myGeneralDog = new Dog("General");

Firstly, any reference can be assigned to another reference if it is the same type of reference:

Dog! yourMandatoryDog = myMandatoryDog; // OK.
Dog? yourNullableDog = myNullableDog; // OK.
Dog yourGeneralDog = myGeneralDog; // OK.

Here are all the other possible conversions. Note that when I talk about 'intent' I am meaning the idea that a traditional (general) reference is conceptually either mandatory or nullable at any given point in the code. This intent is explicit and self-documenting in the new style references, but it still exists implicitly in general references (e.g. "I know this reference can't be null because I wrote a null check", or "I know that this reference can't or at least shouldn't be null from my knowledge of the business domain").

Dog! mandatoryDog1 = myNullableDog; // Compiler Error - the nullable reference may be null.
Dog! mandatoryDog2 = myGeneralDog; // Compiler Error - the general reference may be null.
Dog? nullableDog1 = myMandatoryDog; // OK.
Dog? nullableDog2 = myGeneralDog; // Compiler Error - makes an assumption about the intent of the general reference (maybe it is conceptually mandatory, rather than conceptually nullable as assumed here).
Dog generalDog1 = myMandatoryDog; // Compiler Error - loses information about the intent of the mandatory reference (the general reference may be conceptually mandatory, or may be conceptually nullable if the intent is that it could later be made null).
Dog generalDog2 = myNullableDog; // Compiler Error - loses the safety of the nullable reference.

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).

Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .

Some of the conversions that were not possible by direct assignment can be achieved slightly less directly using existing language features:

Dog! mandatoryDog1 = myNullableDog ?? new Dog("Mandatory"); // OK.
Dog! mandatoryDog2 = (myNullableDog != null ? myNullableDog : new Dog("Mandatory")); // OK.

Dog! mandatoryDog3 = (Dog!)myGeneralDog ?? new Dog("Mandatory"); // OK, but requires cast to indicate that we are making an assumption about the intent of the general reference..
Dog! mandatoryDog4 = (myGeneralDog != null ? (Dog!)myGeneralDog : new Dog("Mandatory")); // OK, but requires a cast for the same reason as above.

9. Class Libraries

As mentioned previously, the compiled IL code will be the same whether you use the new style references or not. If you compile an assembly, the resulting binary will not know what type of references were used in its source code.

This is fine for executables, but in the case of a class library, where the goal is obviously re-use, the compiler will need a way of knowing the types of references used in the public method and public property signatures of the library.

I don't know much about the internal structure of DLLs, but maybe there could be some metadata embedded in the class library which provides this information.

Or even better, maybe reflection could be used - an enum property indicating the type of reference could be added to the ParameterInfo class. Note that the reflection would be used by the compiler to get the information it needs to do its checks - there would be no reflection imposed at runtime. At runtime everything would be exactly the same as if traditional (general) references were used.

Now say we have an assembly that has not yet been converted to use the new style references, but which needs to use a library that does use the new style references. There needs to be a way of turning off the mechanism described above so that the library appears as a traditional library with only general references. This could be achieved with an attribute like this:

[assembly: IgnoreNewStyleReferences("SomeThirdPartyLibrary")]

Perhaps this attribute could also be applied at a class level. The class could remain completely unchanged except for the addition of the attribute, but still be able to make use of a library which uses the new style references.

(See my additional post for more details.)

10. Constructors

Eric Lippert's post (see reference in the introduction to this post) also raises thorny issues about constructors. Eric points out that "the type system absolutely guarantees that ...[class] fields always contain a valid string reference or null".

A simple (but compromised) way of addressing this may be for mandatory references to behave like nullable references within the scope of a constructor. It is the programmer's responsibility to ensure safety within the constructor, as has always been the case. This is a significant compromise but may be worth it if the thorny constructor issues would otherwise kill off the idea of the new style references altogether.

It could be argued that there is a similar compromise for readonly fields which can be set multiple times in a constructor.

A better option would be to prevent any access to the mandatory field (and to the 'this' reference, which can be used to access it) until the field is initialised:

public class Car
{
    public Engine! Engine { get; private set; }

    public Car(Engine! engine)
    {
        Engine.Start(); // Compiler Error
        CarInitializer.Initialize(this); // Compiler Error - the 'this' reference could be used to access Engine methods and properties
        Engine = engine;
        // Can now use Engine and 'this' at will
    }
}

Note that it is not an issue if this forces adjustment of existing code - the programmer has chosen to introduce the new style references and thus will inevitably be adjusting the code in various ways as described earlier in this post.

And what if the programmer initializes the property in some way that still makes everything safe but is a bit more obscure and thus more difficult for the compiler to recognise? Well, the general philosophy of this entire proposal is that the compiler recognises a finite list of sensible constructs, and if you step outside of these you will get a compiler error and you will have to make your code simpler and clearer.

11. Generics

Using mandatory and nullable references in generics seems to be generally ok if we are prepared to have a class constraint on the generic class:

class GenericClass<T>
    where T : class // Need class constraint to use mandatory and nullable references
{
    public void TestMethod(T? nullableRef)
    {
        T! mandatoryRef = null; // Compiler Error - mandatory reference cannot be null
        string s = nullableRef.ToString(); // Compiler Error - cannot dereference nullable reference
    }
}

However there is more to think about generics - see comments below.

12. Var

This is the way that I think var would work:

var dog1 = new Dog("Sam"); // var is Dog! (the compiler will keep things as 'tight' as possible unless we tell it otherwise).
var! dog2 = new Dog("Sam"); // var is Dog!
var? dog3 = new Dog("Sam"); // var is Dog?
var dog4 = (Dog)new Dog("Sam"); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningMandatoryRef(); // var is Dog!
var! dog2 = MethodReturningMandatoryRef(); // var is Dog!
var? dog3 = MethodReturningMandatoryRef(); // var is Dog? (see conversion rules)
var dog4 = (Dog)MethodReturningMandatoryRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningNullableRef(); // var is Dog?
var! dog2 = MethodReturningNullableRef(); // Compiler Error (see conversion rules)
var? dog3 = MethodReturningNullableRef(); // var is Dog?
var dog4 = (Dog)MethodReturningNullableRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningGeneralRef(); // var is Dog
var! dog2 = MethodReturningGeneralRef(); // Compiler Error (see conversion rules)
var? dog3 = (Dog)MethodReturningGeneralRef(); // var is Dog? (see conversion rules - needs cast)

The first case in each group would be clearer if we had a suffix to indicate a general reference (say #), rather than having no suffix due to the need for backwards compatibility. This would make it clear that 'var#' would be a general reference whereas 'var' can be mandatory, nullable or general depending on the context.

12. More Cases

In the process of thinking through this idea as thoroughly as possible, I have come up with some other cases that are mostly variations on what is presented above, and which would just have cluttered up this post if I had put them all in. I'll put these in a separate post in case anyone is keen enough to read them.

HaloFour commented 9 years ago

@lukasf I believe that's already been proposed a couple of times. #3330

This has two problems. One, it is a breaking change, even if only eventually. Once a project is converted no new code could be added from outside sources without having to go through a migration process.

Second, this creates a new dialect of the language, which can only be a source of confusion. For a long time the vast, vast majority of public information about C# will note that any non-decorated reference type is nullable.

Symmetry is nice but I don't think it's worth overhauling one of the core aspects of the language.

lukasf commented 9 years ago

@HaloFour Maybe you are right about this creating a new dialect of the language. Unfortunately. I would still love to see this feature. I guess I will have to wait for a next gen language to bring this as a core feature (maybe the experimental M# language we heard about).

bunsen32 commented 9 years ago

I have also had some thoughts about nullability & C#/.NET. I’m honestly not sure whether this should be a comment to this existing thread or a brand new enhancement request, but if anyone’s interested, I’ve written it up here:

There are a few areas around backward-compatability which I still need to write down.

My basic design (adding non-nullable and explicitly-nullable references) is the same as @Neil65’s. The main differences are:

  1. Initialisation/default. My design doesn’t even attempt to require that all values are initialised. I don’t think it’s possible within the .NET VM, and it massively complicates the language, so I propose the idea of throwing an exception if an uninitialised non-nullable reference field is accessed.
  2. Syntax for guards to test for null values. I’ve avoided this area completely, since I think it’s a separate concern, and there are some promising proposals already for pattern matching which could work for matching nulls too.
  3. Generic types/methods. I’ve written more about this :) There’s loads of weird corner cases around generic code, particularly in dealing with existing code, so I’ve proposed another couple of type constraints, and some rules around generics.
danieljohnson2 commented 9 years ago

@bunsen32's proposal makes me rather feel that, even we could rewrite the CLR itself, we'd still be in a lot of trouble combining mandatory references with generics freely. Not knowing if a type variable T has a default value is going to create a lot of weird corner cases.

It might be better to have a stricter rule: no generic type variable can ever be realized with a mandatory reference type, so List<string!> is not allowed at all, ever.

Instead, you would need to create a 'mandatory reference list' type, like this:

public class NotNullList<T> : IList<T>
    where T : class
{
   public T! this[int index] { .... }
   T IList<T>.this[int index] { .... }
}

So you can apply ! to a type variable to a mandatory version of it, but we would know that any type variable unadorned is nullable (or at least has a default- default(T) is always ok, default(T!) Is not).

bunsen32 commented 9 years ago

Yeah, I proposed a new generic type constraint (which uses the ‘default’ keyword) to indicate that a type parameter has a default value… then made it the (ahem) default, since it’s what existing generic code expects.

It would be a great waste not to allow mandatory references as type parameters in the general case, though. We must forbid them as type parameters to legacy generic classes, but new generic code can be written without the assumption that there is a ‘default(T)’.

olmobrutall commented 9 years ago

I think non-nullable reference types and generics should work just fine in practice. List, Dictionary<K,V>, etc do not return default values and have other mechanisms to store which cells are empty and which not, so getting a FieldUninitializedException in this case should be ok, without any static checking

The main problem that I see is the need to deprecate FirstOrDefault/SingleOrDefault/LastOrDefault.

GeirGrusom commented 9 years ago

I think non-nullable reference types and generics should work just fine in practice

Generic code has to be written to handle mandatory references, and in some cases CLR support is required.

Consider the following:

public T Foo<T>(T input) where T : ICloneable
{
  return (T)input.Clone();
}

Exisiting code with a new() constraint will break as well since it returns a new instance of the wrapper, which in all cases will be an invalid mandatory reference rather than a new instance of the wrapped type.

danieljohnson2 commented 9 years ago

The problem with generic type variables inhabited by mandatory references is that existing framework classes would break. Consider List<T>; this contains a T[], and when you shrink the list it clears the trailing elements of that array. It must do this so the objects formerly in those elements can be garbage collected.

But if T is, say, Object!, what then? If Array.Clear is a no-op, it will leak memory. If it is actually going to null-out the array elements, those elements are not very mandatory after all.

paulomorgado commented 9 years ago

@olmobrutall, why do FirstOrDefault/SingleOrDefault/LastOrDefault need to be deprecated?

lukasf commented 9 years ago

I think we would need two List<T> implementations, one for nullables (List<T> where T : default) and one for manatory types (MandatoryList<T> where T : mandatory). Generic type constraints would prevent you from using the wrong list type. The one for mandatory types would need a different implementation. Internally it would probably use an array of T? (nullable array of the type) to allow fast grow and shrink of the list. On access it would get the real (non-null) value from the nullable array. Both list types would implement IList<T> since the interface does not have any methods that would break on mandatory types.

The current FirstOrDefault,... methods would get a type constraint so they can only be used on enumerations with nullable types (e.g. "where T : default"). They don't need to be deprecated, but obviously they can't work with mandatory types. But we could add a new FirstOrDefault() for mandatory types (e.g. "public T? FirstOrDefault<T>(this IEnumerable<T>) where T : mandatory"). This way you can also use FirstOrDefault() with mandatory Enumerations, which will return a nullable result of course.

GeirGrusom commented 9 years ago

You are focusing on a single case where it breaks down, but mandatory types breaks generics all over the place.

lukasf commented 9 years ago

Can you elaborate a bit on why exactly they break? On first sight, I do not see how any of these would cause problems. Activator.CreateInstance will invoke the default constructor of T and return the instance. If no default constructor exists, it will throw (as it does now). Maybe I am missing something?

GeirGrusom commented 9 years ago

Activator.CreateInstance<T>() will fail because the default constructor for any mandatory value will return a mandatory wrapper with a null value.

GetDelegateForFunctionPointer<T> will fail because it requires that T is a delegate type, but Action! for example is not a delegate type, but it seems like a reasonable return type for the function. But it's not.

where T : new() breaks for the same reason as Activator.CreateInstance<T>(): the result of a new mandatory reference is null.

where T: ICloneable fails because mandatory types will never have ICloneable - it's the contained type that can have ICloneable and in that case the run time has to be able to delegate that to the reference rather than the mandatory wrapper.

So we end up with a feature (mandatory types as generic type arguments) that breaks in so many cases that it seems completely ridiculous to allow it.

lukasf commented 9 years ago

You assume that mandatory types will be implemented by some kind of wrapper around a non-mandatory type. Only based on that assumption, these cases will fail. But I don't think that this is how mandatory types would be implemented. Especially because using wrappers would put a considerable negative performance impact on usage of mandatory types.

This will be implemented as a language / compiler feature: The language just won't allow you to write code where a mandatory type is null. And due to that, there is no need for "real" null checks on these values. They can never be null, so they can always be accessed directly, without the need for a null check through some wrapper. Internally, they will be treated like normal nullable references. Runtime changes might be needed as well, but I see this mainly as a compiler / language feature. And I am pretty sure that it can be accomplished without the need of any kind of wrapper object. The generated code probably won't differentiate between nullable and non-nullable references.

GeirGrusom commented 9 years ago

The language just won't allow you to write code where a mandatory type is null.

Except when it does:

public T Foo<T>()
{
    return default(T);
}

Foo<string!>() here will return null and there is nothing the compiler can do about it (except not allow mandatory types in generics).

GeirGrusom commented 9 years ago

The issue here is that generic code is written with the assumption that reference types can be null. By using mandatory types you are applying a constraint to generic code that was assumed non-existent at the time of writing. If a mandatory constraint for generics should exist, it must be applied by the generic code site, not the caller.

lukasf commented 9 years ago

The compiler knows that T is mandatory, so this will just not compile. Much like List<T> won't work. You need to use MandatoryList<T> instead. Not all generic code breaks, but all generics that use default(T) will not work with mandatory types. FirstOrDefault<T>() will not work with mandatory values, but FirstOrDefault<T>(T defaultValue) will. All generics in the framework should contain new constraints to reflect the limited use.

Please remember that when you use a generic class with a specific type, then the compiler will compile code for this specific type. So Foo<string> will have a different implementation than Foo<string!>. Any problems with generics and mandatory types will be caught during compile time. And if you dynamically create generic types during runtime, then the JIT compiler will compile and emit new code, so errors when using generics dynamically would cause errors at runtime. Even if you use old generic implementations (without constraints) with new mandatory types, you would get errors when using them, often during compile time and latest at runtime.

bunsen32 commented 9 years ago

@GeirGrusom, you’re right: existing generic code is written to assume that all types have a default value. If we invent non-nullable reference types, ‘has a default value’ becomes a constraint on generic type arguments (and one which newer generic code probably wants to be able to relax). I can’t understand your other “broken by mandatory types” examples though! (You seem to be introducing a wrapper struct in order to break things!)

@lukasf, There’s no reason that the .NET framework type List<T> shouldn’t be rewritten if non-nullable references are introduced, rewritten in such a way that it can allow nullable and non-nullable type parameters alike. It would be profoundly disruptive if client code had to use one type of list for nullable references and structs, and another type of list for non-nullable references. I’ve outlined in a blog post how List<T> could be modified to no longer require ‘default’: https://dysphoria.net/2015/06/16/nullable-reference-types-in-c-generics/

bunsen32 commented 9 years ago

@danieljohnson2 Yes, there needs to be some kind of way of dealing with mutable arrays of non-nullable references… and the collection classes, for example, need to be able to ‘unset’ them in order to drop references so that they don’t leak memory. I think the only way for the CLR to allow that—and also to deal with the issue of non-nullable fields in structs whose constructor has never been run—is to allow fields and array elements (of non-nullable references) to be ‘uninitialised’.

If you try to access an uninitialised field/array-element, you get an exception (so collection classes would need to be designed not to read from uninitialised array elements). Array.Clear would set elements back to ‘uninitialised’.

HaloFour commented 9 years ago

@bunsen32

Making List<T> capable of understanding and enforcing non-nullable types is quite impractical for a number of reasons.

For starters, non-nullable types aren't going to be a new form of type according to the runtime. A List<string!> (assuming a ! syntax) is the same as a List<string> and the runtime is none-the-wiser. As the IL needs to be identical for both the container cannot selectively enforce different rules. Assuming that a new generic constraint would be added to also enforce non-nullability generic constraints aren't selectively enforced by the consumer of the generic type, so List<T> couldn't both be nullable and non-nullable.

A lot of this discussion has been obsoleted by #3910 where the proposal is to provide little more than attribute-based decoration to denote the nullability of parameters and static/flow analysis to provide enforcement. The CLR enhancements necessary to make it possible to prevent null array elements or to enforce non-nullability in a generic container aren't on the horizon.

danieljohnson2 commented 9 years ago

@bunsen32, I think I understand your proposal better now. You would leave a hole in the type system: default(S) for a struct S (that contains a mandatory field) would generate a mandatory reference that is null; assigning this to something would not involve a runtime null check on this field (I expected a runtime check there!).

That'll do the trick, but it seems to me to be too large a hole in the type system, It will be very easy to accidentally introduce nulls in fields that are apparently "not nullable". I still think that structs can't reasonably contain mandatory fields, unless mandatory just means 'null-checked on read'. If it does mean only that, that check needs to be present at every relevant assignment. If null values in mandatory fields are allowed to propagate unchecked, the feature doesn't really add much to what we have now.

Anyway, if CLR enhancements are off the table your proposal is not implementable, and it might be wiser to wait to see how this stuff plays out with Apple's Swift language.

bunsen32 commented 9 years ago

@HaloFour, yes, my proposal would require a change to the runtime, and non-nullable (and Nullable<>) would have to be separate runtime types. (It would be a change of the order of the change to introduce generic types in .NET 2.) I do hope the .NET team consider something similar for a future release.

@danieljohnson2, Yeeees, it's not quite a hole in the type system: it's still type safe, but if you attempt to read a non-null reference from an unassigned slot, it would throw an exception. I think it's an acceptably-sized hole!

danieljohnson2 commented 9 years ago

@bunsen32, I think it is a way to read a null out of an unassigned slot without a check. Consider this:

struct S { public Object! Value };
S containsNull = default(S);
S dest;
dest.Value = containsNull.Value; // checked: this throws an exception
dest = conatinsNull; // unchecked: this copies a null

The behavior I had expected here was that the two assignments would be equivalent (and both checked!). I think this would be a nasty trap for programmers; the difference between the two cases is not obvious at the point of use.

I'm not a big fan of warnings: just make it an error to have a mandatory field in a struct and you've closed this hole entirely. If we must have a way to de-initialize a mandatory field, that should be an explicit syntax; or perhaps a method like this:

Mandatory.Deinitialize(out victim);

Which is implemented in IL, ignores the fact that victim is supposed to be mandatory, and nulls it out. This has the advantage that you can search for it, and review all uses. The trick with a default-valued structure is really not something you can search for.

GeirGrusom commented 9 years ago

@bunsen32 what you are suggesting will be covered by #119. No need to add special casing for null values.

The advantage of building it into the type system is that null checks can be omitted by the CLR if the CLR adds support for mandatory types since not-null validation is preserved across a value copy. Copy from an annotated field provides no such guarantee.

In either version old style generics cannot support mandatory types unless we want a compiler that can easily contradict itself, and in my opinion adding it to the type system actually solves something that code contracts do not.

bunsen32 commented 9 years ago

Heh, @danieljohnson2, that Deinitialize method opens up another hole :)

It does seem quite appealingly symmetrical to allow deinitializing memory slots (and you could define it to work on any type, so for types where default is defined, it would reset the value to default). However, is there any way to limit an out parameter to only apply to fields and array elements? Because we don't want to allow 'deinitializing' parameters and local variables!

Disallowing fields of structs to be mandatory references seems unduly restrictive to me. Especially since you might be writing generic code and not know the exact type of your field. It would make the definition of Nullable<> itself quite tricky! (The compiler could potentially disallow mandatory fields where the type is explicit (not generic) though—we'd assume that authors of generic code know what they're doing.)

lukasf commented 9 years ago

I don't like the idea of uninitialized fields. It opens up a huge loophole. If code that uses mandatory references cannot guarantee that the code is really safe from NullReferenceExceptions, then the concept fails and you could as well continue working with normal nullable references instead...

bunsen32 commented 9 years ago

@lukasf, You can’t avoid uninitialized fields completely without changing the C# language and .NET runtime quite radically. ‘readonly’ fields have a very similar problem when you’re talking about class fields. If constructors allow their object to escape into a global variable, or another thread, for example, all bets are off as to whether the object has been correctly initialised/constructed. This is a general ‘issue’ with .NET. And then there’s arrays…

However, I disagree that this is a show-stopper. Allowing some mandatory/non-nullable references in some cases (cases which can be warned about by static analysis tools, and avoided by reasonable coding practices), is the trade-off to allow non-nullable references to be enforced in the great majority of (sanely-written) C#.

In addition, an ‘unassigned field’ doesn’t return a null; it throws an exception whenever it’s directly accessed. This is an improvement upon a NullReferenceException since it fails-faster: whereas nulls propagate through the program and can cause an exception later; unassigned fields being accessed cause an exception there and then. They’re amenable to easier static analysis, and easier debugging.

Antaris commented 9 years ago

Sorry if this has already been mentioned, but can these nullable/non-nullable semantics also be applied to method return types and method parameters, e.g.:

public Dog! GetDog(string! name)
{
  // return SomeMethodThatReturnsDog(); // Could fail if general reference or nullable reference?
  // return null; // Will fail to compile
  return new Dog(name); // Ok
}

//var dog = GetDog(null); // Will fail, can't pass null reference
var dog = GetDog("Banjo");

dog.Bark(); // Ok because we know the method retuned Dog!
danieljohnson2 commented 9 years ago

@Antaris I should expect parameters and return types to be covered in any proposal- those are the easy cases, because you can always enforce them at runtime by compiling extra null checks into method bodies. Local variables are similarly straightforward.

Array elements and class fields are the trouble - they can be altered without any obvious place to stick a runtime null check. You'd need to inject checks in any code that accesses them, even if you aren't compiling that code. This is why building this feature in the CLR would be more effective- it can apply checks to all code.

Struct fields are the worst; they have all the problems class fields do, plus default(S) gives you a struct full of nulls. Even if we get to rewrite the CLR itself, it's not obvious how to deal with this.

rausm commented 9 years ago

Perhaps look at how eiffel [The Bertrand Meyer Design-By-Contract(trademark-or-whatever) language] solved it. Look up eiffel and either void-safety or CAP (certified attachment patterns). i think they proved that their way of doing it is sound (ie. their [enforced] safe/attached references are safe and after any applicable check the [possibly] detached reference is also considered safe). Of course, it being eiffel, they have a lot smaller language and some slight limitations to come with it.

IS4Code commented 8 years ago

Just want to add that adding a "!" type modifier would probably be best done via an optional modifier (modopt in CIL). This needs better reflection for optional and required modifiers, which .NET kinda lacks.

Pzixel commented 8 years ago

AFAIK non-nullable types now exists in new C# 7.0. So my question is: for example i wrote a library using C# 7.0 with non-nullable types, and then publish it. My friend downloads it, and want to pass a null using C# 6.0 for example. What happens? It just doesn't compile for unknown reason? Becuase C# 6.0 knows nothing about this keyword. Or it inserts tons of if (arg == null) throw new ArgumentNullException(nameof(arg))? Becuase second approach leads to significant performance impact. I'd like this feature to be a compile-time one, without extra runtime checks. Of course, we have branch-prediction, but it looks ugly anyway. if it is.

HaloFour commented 8 years ago

@Pzixel

Non-nullable reference types is being delayed to C# 8.0 or later. Any answer could potentially change.

However, based on the proposals for this feature at this time, to a down-level compiler (or one that simply didn't understand non-nullable references) the arguments would appear as normal nullable reference types. That compiler could generate code that passed null. There would also be no automatically-inserted null checks, so performance would not be affected, but neither would there be any guarantees that the value is not null within the method.

Pzixel commented 8 years ago

@HaloFour but it's weird. I abolutely sure that I don't want to check is passed value is null when I said that i's not null. Imagine the code

public static void Foo(IBar! bar)
{
   bar.Bark();
}

it would be VERY strange if I get a NullReferenceException here.

I see only two possibilities here: compiler automaticly adds not-null checks everythere in the code, or it just will be CLR-feature, so it won't be possible to reference C# 8.0 (or whatever version) from below one.

I think they will use the first approach, becuase as I said, there is branch predictor, so extra-check for not-null will be predicted and skipped for most of time, and it also makes them easier to implement it: for example, if we leave it as compiler-feature, it's hard to make reflection work fine with it. And if we have runtime checks in methods, we have nothing to do with reflection.

HaloFour commented 8 years ago

@Pzixel

I am only relaying the proposal as it is currently. Both of those approaches have already been discussed as, as of now, neither are being implemented. This will be purely a compiler/analyzer feature. It won't even result in compiler errors, just warnings, which can be disabled and intentionally worked around.

I believe the latest version of this proposal is here: #5032

As mentioned, this is at least an additional C# version out, so it's all subject to change.

MikeyBurkman commented 8 years ago

@Pzixel I assume (hope) that IBar! would be implemented as a different underlying type than IBar, and so it would never even be an issue. (Kind of like how int and Nullable<int> are different underlying types, and the compiler just allows for nice syntactic sugar.) Putting a null check in that method would be akin to making adding a check that an argument of type int is not actually a string.

HaloFour commented 8 years ago

@MikeyBurkman

Actually, the non-nullable version would be IBar, and the nullable version would be IBar?. The only difference between the two would be an attribute, they would be the same underlying type.

Pzixel commented 8 years ago

@HaloFour I don't know if T? is good syntax because it breaks down entire existing code. No, i'm totally agree that it's more consistent, than mixing ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code. And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null. If i want a warning - i can use [NotNull] attribute, not introducing a whole new type. And it makes sense.

HaloFour commented 8 years ago

@Pzixel

I don't think T? is good syntax because it breaks all existing code. No, i'm totally agree that it's more consistent, than ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code

I've already made that argument, but it seems that this is the direction that the team wants to go anyway. I believe that the justification is that the vast majority case is wanting a non-nullable type so having to explicitly decorate them would lead to a lot of unnecessary noise. Pull the band-aid once.

And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null.

Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.

MikeyBurkman commented 8 years ago

I'm pretty sure you're going to have to break backwards compatibility anyways, or make type inference useless.

// Pretend this is some legacy code
var x = new IBar(); // Line A
...
x = null; // Line B

What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.

Either devs won't used non-null types, or devs will stop using type inference. I can imagine which of those two options will win out...

Pzixel commented 8 years ago

@HaloFour

Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.

You should rehash nothing, basically i only want to get type mismatch when I get it insead of warnings and so on. It will emit checks or it will be a compiler feature - that is topic to speak, but if we are talking about interface - type mismatch - it's defenitly should be an error.

gafter commented 8 years ago

@MikeyBurkman re

What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.

Local variables will have a nullable type state that can be different from one point in the program to another, based on the flow of the code. The state at any given point can be computed by flow analysis. You won't need to use nullable annotations on local variables, because it can be inferred.

Pzixel commented 8 years ago

@gafter var is used to infer type in point of declaration, we shouldn't analyze any flow after.

HaloFour commented 8 years ago

@Pzixel "Nullability" isn't being treated as a separate type, it's a hint to the compiler. The flow analysis is intentional to prevent the need for extraneous casts when the compiler can be sure that the value would not be null, e.g.:

public int? GetLength(string? s) {
    if (s == null) {
        return null;
    }
    // because of the previous null check the compiler knows
    // that the variable s cannot be null here so it will not
    // warn about the dereference
    return s.Length;
}
MikeyBurkman commented 8 years ago

So I'm still a bit confused. @gafter's comment insinuated that flow analysis would go upwards, while @HaloFour's example demonstrates it going downwards. Downwards flow analysis would be pretty much required if any implementation, and in fact R# already does that sort of analysis with the [NotNull] attributes. However, without the upwards flow analysis, I don't think type inference would be able to provide much benefit, unless breaking backwards compatibility was an option.

Pzixel commented 8 years ago

@HaloFour int and int? are completly different types. I really want the same UX for reference types. I can use attribute [NotNull], [Pure] and so on for a warning. I want to be abolutely sure that I CAN'T receive null if it is marked as not null. So in provided example:

public int? GetLength(string? s) {
    string notNullS = s; // compiler error: cannot implicitly cast `string?` to `string`. 
    return GetLength(notNullS); 
}
public int GetLength(string s) {    
    return s.Length;
}

Of course, ideally i'd like to see something like unwrap from Rust, but explicit cast is good enough.

HaloFour commented 8 years ago

@Pzixel

I want to be abolutely sure that I CAN'T receive null if it is marked as not null

Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null can sneak in by virtue of being the default value.

Flow analysis is a compromise, one that can be fitted onto the existing run time and one that can work with a language that has 15 years of legacy that it needs to support. It follows the Eiffel route, know where you can't make your guarantees and solve through flow analysis. Even then, sometimes the developer can (and should) override.

HaloFour commented 8 years ago

@MikeyBurkman

IIRC the type inferred by var is neither necessarily nullable or non-nullable, it's a superposition of both potential states depending on how the variable is used. From @gafter's comment it sounds like that applies to any local even if the type is explicitly stated, e.g.:

string s2 = null; // no warning?
int i1 = s1.Length; // warning of potential null dereference

string? s2 = "foo";
int i2 = s2.Length; // no warning
Pzixel commented 8 years ago

Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null can sneak in by virtue of being the default value.

Generics was introduced once, another major change is possible too. Nobody says that it's easy, but they have to do it to implement it properly. It's the only way to make a strong type system. All hints and warnings are just nothing. It can be internally null, I don't think that significant changes required to accomplish this requirements. Just another type, it's not even CLR care. Compiler just checks that type is not-null and the only way to pass null value is reflection. Thus, we need to change reflection, but it's easy too - while string and string? are different types, there will be type mismatch.

Now i see that it's even simpler than I thought. Just treat them as others types and that's all. Reflection throws a mismatch error in runtime, compiler does it in compile-time and everyone are happy. And it even still be compile-time feature. The only problem is with older versions of C#, but changes in reflection should be changes in runtime, so it will be feature for net .Net.

We can do compatble version with runtime checks, for example when we use .Net 4.6 and below runtime checks (if blabla != null), with .Net 4.7 we assume that reflection do its job in runtime and remove them from the code. Elegant solution.

HaloFour commented 8 years ago

Generics was introduced once, another major change is possible too.

Generics was additive and worked entirely within the existing semantics of the run time.

Just treat them as others types and that's all.

That "other" type can't prevent the reference type that it contains from being null. Either it is a reference type that itself can be null (and wastes an allocation when it's not) or it's a struct container that contains a reference type where the default value of the struct is that the reference is null. Either way, you're back to square one. Furthermore, since the majority of methods within the BCL accept reference types that should be null you're talking about a massive breaking change to all existing programs. This solution has already been proposed.