dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.1k stars 4.04k forks source link

Proposal: Nullable reference types and nullability checking #5032

Closed MadsTorgersen closed 7 years ago

MadsTorgersen commented 9 years ago

Null reference exceptions are rampant in languages like C#, where any reference type can reference a null value. Some type systems separate types into a T that cannot be null and an Option<T> (or similar) that can be null but cannot be dereferenced without an explicit null check that unpacks the non-null value.

This approach is attractive, but is difficult to add to a language where every type has a default value. For instance, a newly created array will contain all nulls. Also, such systems are notorious for problems around initialization and cyclic data structures, unless non-null types are permitted to at least temporarily contain null.

On top of these issues come challenges stemming from the fact that C# already has null-unsafe types, that allow both null values and dereferencing. How can a safer approach be added to the language without breaking changes, without leaving the language more complex than necessary and with a natural experience for hardening existing code against null errors in your own time?

Approach: Nullable reference types plus nullability warnings

The approach suggested here consists of a couple of elements:

The feature will not provide airtight guarantees, but should help find most nullability errors in code. It can be assumed that most values are intended not to be null, so this scheme provides the minimal annotation overhead, leaving current code to be assumed non-nullable everywhere.

Nullable reference types are helpful, in that they help find code that may dereference null and help guard it with null checks. Making current reference types Non-nullable is helpful in that it helps prevent variables from inadvertently containing an null value.

Nullable and non-nullable reference types

For every existing reference type T is now "non-nullable", and there is now a corresponding nullable reference type T?.

Syntactically speaking, nullable reference types don't add much, since nullable value types have the same syntax. However, a few syntactic corner cases are new, like T[]?.

From a semantic viewpoint, T and T? are mostly equivalent: they are the same type in that they have the same domain of values. Specifically, because the system is far from watertight, non-nullable reference types can contain null. The only way in which they differ is in the warnings caused by their use.

For type inference purposes T and T? are considered the same type, except that nullability is propagated to the outcome. If a reference type T is the result of type inference, and at least one of the candidate expressions is nullable, then the result is nullable too.

If an expression is by declaration of type T?, it will still be considered to be of type T if it occurs in a context where by flow analysis we consider it known that it is not null. Thus we don't need new language features to "unpack" a nullable value; the existing idioms in the language should suffice.

string s;
string? ns = "hello";

s = ns; // warning
if (ns != null) { s = ns; } // ok

WriteLine(ns.Length); // warning
WriteLine(ns?.Length); // ok

Warnings for nullable reference types

Values of nullable reference types should not be used in a connection where a) they are not known to (probably) contain a non-null value and b) the use would require them to be non-null. Such uses will be flagged with a warning.

a) means that a flow analysis has determined that they are very likely to not be null. There will be specific rules for this flow analysis, similar to those for definite assignment. It is an open question which variables are tracked by this analysis. Just locals and parameters? All dotted names?

b) means dereferencing (e.g. with dot or invocation) or implicitly converting to a non-nullable reference type.

Warnings for non-nullable reference types

Variables of non-nullable reference types should not be assigned the literal null or default(T); nor should nullable value types be boxed to them. Such uses will result in a warning.

Additionally, fields with a non-nullable reference type must be protected by their constructor so that they are a) not used before they are assigned, and b) assigned before the constructor returns. Otherwise a warning is issued. (As an alternative to (a) we can consider allowing use before assignment, but in that case treating the variable as nullable.)

Note that there is no warning to prevent new arrays of non-nullable reference type from keeping the null elements they are initially created with. There is no good static way to ensure this. We could consider a requirement that something must be done to such an array before it can be read from, assigned or returned; e.g. there must be at least one element assignment to it, or it must be passed as an argument to something that could potentially initialize it. That would at least catch the situation where you simply forgot to initialize it. But it is doubtful that this has much value.

Generics

Constraints can be both nullable and non-nullable reference types. The default constraint for an unconstrained type parameter is object?.

A warning is issued if a type parameter with at least one non-nullable reference constraint is instantiated with a nullable reference type.

A type parameter with at least one non-nullable constraint is treated as a non-nullable type in terms of warnings given.

A type parameter with no non-nullable reference constraints is treated as both a nullable and a non-nullable reference type in terms of warnings given (since it could without warning have been instantiated with either). This means that both sets of warnings apply.

? is allowed to be applied to any type parameter T. For type parameters with the struct constraint it has the usual meaning. For all other type parameters it has this meaning, where S is the type with which T is instantiated:

Note: This rule is not elegant - in particular it is bad that the introduction of a struct constraint changes the meaning of ?. But we believe we need it to faithfully express the type of common APIs such as FirstOrDefault().

Opting in and opting out

Some of the nullability warnings warn on code that exists without warnings today. There should be a way of opting out of those nullability warnings for compatibility purposes.

When opting in, assemblies generated should contain a module-level attribute with the purpose of signaling that nullable and non-nullable types in signatures should generate appropriate warnings in consuming code.

When consuming code references an assembly that does not have such a top-level attribute, the types in that assembly should be treated as neither nullable nor non-nullable. That is, neither set of warnings should apply to those types.

This mechanism exists such that code that was not written to work with nullability warnings, e.g. code from a previous version of C#, does indeed not trigger such warnings. Only assemblies that opt in by having the compiler-produced attribute, will cause the nullability warnings to happen in consuming code accessing their signatures.

When warnings haven't been opted in to, the compiler should give some indication that there are likely bugs one would find by opting in. For instance, it could give (as an informational message, not a warning) a count of how many nullability warnings it would have given.

Even when a library has opted in, consuming code may be written with an earlier version of C#, and may not recognize the nullability annotations. Such code will work without warning. To facilitate smooth upgrade of the consuming code, it should probably be possible to opt out of the warnings from a given library that will now start to occur. Again, such per-assembly opt-out could be accompanied by an informational message reminding that nullability bugs may be going unnoticed.

Libraries and compatibility

An example: In my C# client code, I use libraries A and B:

// library A
public class A
{
  public static string M(string s1, string s2);
}

// library B
public class B
{
  public static object N(object o1, object o2);
}

// client C, referencing A and B
Console.WriteLine(A.M("a", null).Length);
Console.WriteLine(B.N("b", null).ToString());

Now library B upgrades to C# 7, and starts using nullability annotations:

// upgraded library B
public class B
{
  public static object? N(object o1, object o2); // o1 and o2 not supposed to be null
}

It is clear that my client code probably has a bug: apparently it was not supposed to pass null to B.N. However, the C# 6 compiler knows nothing of all this, and ignores the module-level attribute opting in to it.

Now I upgrade to C# 7 and start getting warnings on my call to B.N: the second argument shouldn't be null, and I shouldn't dot into the return value without checking it for null. It may not be convenient for me to look at those potential bugs right now; I just want a painless upgrade. So I can opt out of getting nullability warnings at all, or for that specific assembly. On compile, I am informed that I may have nullability bugs, so I don't forget to turn it on later.

Eventually I do, I get my warnings and I fix my bugs:

Console.WriteLine(B.N("b", "")?.ToString());

Passing the empty string instead of null, and using the null-conditional operator to test the result for null.

Now the owner of library A decides to add nullability annotations:

// library A
public class A
{
  public static string? M(string s1, string s2); // s1 and s2 shouldn't be null
}

As I compile against this new version, I get new nullability warnings in my code. Again, I may not be ready for this - I may have upgraded to the new version of the library just for bug fixes - and I may temporarily opt out for that assembly.

In my own time, I opt it in, get my warnings and fix my code. I am now completely in the new world. During the whole process I never got "broken" unless I asked for it with some explicit gesture (upgrade compiler or libraries), and was able to opt out if I wasn't ready. When I did opt in, the warnings told me that I used a library against its intentions, so fixing those places probably addressed a bug.

paulomorgado commented 9 years ago

So, @NightElfik, instead of NullReferenceExceptions, what have you been seeing?

This proposal only does dataflow analysis on varriables declared to be nullable. Nothing else.

jacekbe commented 9 years ago

This proposal looks very nice and I would really like it implemented. Imho it's much better than introduction of explicit non-nullable types for several reasons:

I can imagine that adoption of new semantics would take long time so users should be able to control whether incorrect usage of references leads to warning, error or is not signalled at all. In the worst case scenario people would just disable nullability tracking, but I doubt it as this feature is going to be very helpful.

or150 commented 9 years ago

I wonder what will happen with the default clause. ValueTypes have a default ctor and therefore a default value. Reference types on the other hand dont. It will break lots of generic types. Also what will happen to all the TryGetX methods. Will there be a wrapper type for mandatory references (Mandatory<>)? I think that the mandatory types (T!) proposal will fit better with existing constucts.

HaloFour commented 9 years ago

@or150 Actually the ctor of a value type is entirely optional. C# 6.0 allows defining a default parameterless ctor for a value type and if you then use default(type) that ctor is never called:

public struct Foo {
    public int X { get; }

    public Foo() { X = 1; }
}

static void Main() {
    Foo foo1 = new Foo();
    Foo foo2 = default(Foo);
    Debug.Assert(foo1.X == 1);
    Debug.Assert(foo2.X == 0);
}

It doesn't appear that the behavior of default() is mentioned in the proposal. The question I think would be whether or not it could be applied to a non-null reference type. Would default(string) be legal or would you be expected to use default(string?)? If default(string) is legal would it also result in a string? of null? Could type x = default(type) resolve to type x = new type()? Probably not.

This gets much more complicated when generics come into play. I'd love to hear how the compiler might be able to handle default(T) in a method where T has no generic type constraints but the consumer wishes to use a non-nullable reference type for T.

NightElfik commented 9 years ago

@or150 In ideal world without backwards compatibility and with strict nullability rules, I can imagine that default(T) is not valid for non-nullable T. In this world you would not need TryGet methods at all because Get (or indexer) would return T? - nullable reference/struct. When you need detault(T) it is a sign that you actually need nullable type anyways and default(T?) should give you what you need.

In real world, this is a problem and having T be strictly non-nullable type breaks backwards compatibility. If the rules are not strict then it is not as useful as it could be. I am still hoping for some kind of strict-nullability-checking solution though. The T! might work better?

danieljohnson2 commented 9 years ago

Java has already done something similar to this warningtastic proposal. Java's erased generics are enforced through warnings; if you can avoid all such warnings, your program is statically type-safe and runtime checks are not needed.

But you can disable the warnings, and then Java's generics become something else. They become more like Objective-C's 'id' type: they imply invisible, automatic, and unchecked conversions. This is actually very convenient, if not very safe.

It's a good example of the 'dialect' thing: you can write useful programs that work but do not type-check in Java generics. You can just disable the warnings and go on with your day. But you've written a program that is not legal in the other, strictly typed, dialect of Java.

The ugly part is what happens when these worlds collide. The safe, statically typed java world comes crashing down, because the static guarantees only work when everyone plays along. I believe they call this "heap pollution"- you can get data structures whose dynamic content does not agree with the static types originally written for them, so that code that is strictly correct will fail in surprising ways.

I think we should avoid introducing this "heap pollution" into .NET.

BTW, @HaloFour I think that the parameterless constructors in structs got dropped. My VS 2015 won't do them.

HaloFour commented 9 years ago

@or150 Ah, you're right. That must've been cut prior to release. But default(type) doesn't invoke any ctors (default or otherwise), it only emits the IL opcode ininobj which just assures that the memory location is zeroed out.

As for the whole dialect thing, there's a lot about it that leaves room to be desired. For example, the entire "opt-in" for your project thing. That seems way too clumsy and coarse. Turn it on and not only are you forced to fix potentially huge numbers of warnings in otherwise perfectly legit code but you're also immediately cut off from the 15 years of existing sample code on the Internet. I think most projects won't adopt it simply because it's too daunting to do so. Yes, it makes sense, pull the bandaid and get it over with. But there are also countless examples of projects remaining on older versions of languages/frameworks specifically because of breaking/dialect changes like this. My company still has projects compiling with VC++ 6.0 because nobody wants to expend the time and energy updating it for the sake of updating it.

I'm not saying that there is a better answer; frankly there isn't. We're 15 years and 6 versions too late to make such a fundamental paradigm shift. I'd honestly rather see it play out in a forked experimental language in the C# 7.0 timeframe, discover the problems and get real feedback. Perhaps it's not worth it without CLR enhancements.

qrli commented 9 years ago

@NightElfik @HaloFour I think the generic dilemma is already mentioned in OP. There is one paragraph talking about how to solve the generic type issue. In my understanding, it means generic T is treated the same as before, it means T? for reference types and T! for value types. And it allows you to use T? and T! to enforce it. So default(T) and default(T?) are null for reference types, while default(T!) is illegal.

To me, it is an acceptable solution.

Another solution I can imagine is to add generic constraints:

void F<T>() {} // same behavior as now
void F<T>() where T : nullable {} // T is nullable
void F<T>() where T : nonnullable {} // T is non-nullable, so default(T) is illegal.
scottsteesy commented 9 years ago

The concept of having a non-nullable reference type is very interesting, but it seems like most of this proposal is guaranteed to cause all kinds of confusion and problems when mixing old and new code.

Just a thought, but wouldn't it be easier to just add an attribute to the class that says it is supposed to be non-nullable and define a special (static?) constructor ( MyClass!() ) to define the "default" value for the class. Then have the compiler see the attribute and always use the result of the special constructor as the default value for the class. It could also raise errors if a null is assigned to the class type variable. Wouldn't this make it so that new code can have non-nullable reference types? Granted I suppose that if the new code (assembly) is consumed by old code with an older compiler, it wouldn't be enforced, but if they were using a new compiler even with old code and framework shouldn't it work?

HaloFour commented 9 years ago

@qrli

New generic constraints would require CLR changes, although perhaps they could be expressed through the same attribute/erasure method.

Let's take the following method:

public T Foo<T>() => default(T);

It would seem that the compiler would be required to treat consumption of this method as follows:

int i = Foo<int>(); // legal
int? ni = Foo<int?>(); // legal
string? ns1 = Foo<string?>(); // legal
string? ns2 = Foo<string>(); // ?

string s = Foo<string>(); // can't imagine this could be legal, caller cannot enforce this
qrli commented 9 years ago

@HaloFour Your example is interesting. Generalize the question a bit: Since X is a subset of X?, should X be allowed to be used as T??

string? ns2 = Foo<string>();

I would like to see it is allowed, because it would provide better backward-compatibility with old code. But the default(T) inside Foo() is indeed a trouble. More over, Foo() may simply assign null to variable of T if where T : class. So it seems this cannot be allowed.

However, it means T without any constraint can only represent value types, nullable value types, reference types, but not non-nullable reference types. This is not good...

One idea is: in the generic case, only enforce non-nullability on parameters and return value, so default(T) is still allowed as long as its value is not passed outside.

fschmied commented 9 years ago

See also #227.

fschmied commented 9 years ago

There is a ReSharper plugin that causes ReSharper's nullability inspections to treat unannotated type references in parameters, properties, etc. as non-nullable by default: https://resharper-plugins.jetbrains.com/packages/ReSharper.ImplicitNullability/. For example, this generates warnings if you try to pass nullable values to non-nullable parameters. Our team has been using the plugin during the past few months, the experience of having reference types be non-nullable by default is great.

Of course, adding this to the language (and compiler) would be very much superior to the plugin's approach.

dsaf commented 9 years ago

@fschmied

Of course, adding this to the language (and compiler) would be very much superior to the plugin's approach.

but

The feature will not provide airtight guarantees...

Is it really that better than a set of analyzers? I guess at least it will be "standard" and come for free.

jonathanmarston commented 9 years ago

The more I think about this proposal the more I think that it's a bad idea.

What bothers me is that it suggests changing the language spec to make non-nullable the default, but then only enforcing the change through warnings.

C# has is a type-safe language and this change implies that string and string? are different types. If I have a method parameter typed as a string (not string?) I should be able to know, without a doubt, that I will always get a string, not a null. No question. No caveats.

If all it is going to do is produce warnings then it would be better implemented as an attribute and an analyzer.

AdamSpeight2008 commented 9 years ago

I think this should be combined with the ability to specify a specialised non-null <T>.Default (note default(T) remains unaffected)

AdamSpeight2008 commented 9 years ago

Can constructors and initialisation of a type, be made to be "out-of-phase" until fully constructed and initialised. After which it "instantly" becomes usable. The only way I think it could be done is globally locking when doing so.

danieljohnson2 commented 9 years ago

@AdamSpeight2008, it's really not generally possibly to prevent a class from 'leaking' and being observed before it has finished initialization. You can come close by doing your initialization stuff before you chain to your base class (which can then leak a reference safely) but if your class is not sealed, a subclass can do as it likes before you get to run.

The most obvious thing a subclass could do to ruin your non-nullable day is to have a finalizer. If you take an exception during construction, the finalizer will still run. And perhaps crash on an 'impossible' null reference.

But if we can't change the CLR itself, we can't get this to be perfectly airtight no matter what we do. Finalizers are a bad idea in the first place; maybe problems with them are not so terrible.

Nevertheless, while some flaws are tolerable, I think the warning-only approach is too leaky and makes the 'mandatory' references too unreliable.

vladd commented 9 years ago

@danieljohnson2 One thing I don't understand is why it's not possible to change the CLR to add support for "perfectly non-nullable" reference types. I don't think it's actually impossible, it's just believed to be quite complicated. A bigger problem which I see is the BCL: some of the code will be silently not valid any more (e.g.: default(T) for non-nullable T is we allow an arbitrary T for generic parameters), so this implies a big redesign/rewrite of BCL. (Example: for the List<T>, the actual storage be done not with T[] but rather with T?[]).

danieljohnson2 commented 9 years ago

@vladd, the MS compiler guys here seem very reluctant to consider CLR changes, especially big ones. I think it's partly an organizational thing- they aren't in the same department or something. But CLR changes are also scarier from a compatibility point of view.

In particular, if they change the CLR so that unmarked reference types are really not nullable, that will break compatibility with all existing .NET binaries. That's not a goer.

I think the alternative to CLR changes (and a much more complex and ambitious proposal) is really a much less ambitious one- restrict the feature to places where the compiler can do this reliably. But this means restricting it quite a lot.

The middle ground here seems to me to be a proposal that doesn't work, a C# divided into dialects, and compatibility problems anyway.

olmobrutall commented 9 years ago

As someone who has spent some time thinking about non-nullable reference types (https://gist.github.com/olmobrutall/31d2abafe0b21b017d56) I've to say I really like this solution.

I see many developers having problems spotting value and reference types and having a symmetric solution is much better, even if is based on warnings. We can always opt-in to turn them into errors, or even change the CLR in a far future when everybody has made his homework :).

I've two issue however:

Comparing non-nullable reference types with null literal

Today if you write this code you get a warning:

int a = 2;
if(a == null) //warning CS0472: The result of the expression is always 'false' since a value of type 'int' is never equal to 'null' of type 'int?'
{

}

Should the same code generate a warning for non-nullable reference types?

string s = "";
if(s == null) //<--- warning CS0472 as well?
{    
}

The benefit of adding this warning is that will be easier for a developer migrating code to C# 7 to spot potential variable/fields/parameters/return types that should be made explicitly null-able based on how is used, or remove redundant code.

On the other side, since the whole proposal is not watertight, we need a way to check for null on non-nullable reference types without triggering a warning. Maybe casting before?

string s = "";
if(((string?)s) == null) //<--- no warning
{    
}

One typical scenario is argument checking:

public static string Reverse(string s)
{
    if(s == null) //warning CS0472?
        throw new ArgumentNullException(nameof(s1));
} 

Should we warn code that does compile-time AND run-time checking? Or complicate the code with a casting?

Being to string with definitely-assignment on fields

Also typical are object witch initialization is not finished after construction, like objects with cycles.

class Head 
{ 
      Body body; //warning CS0472?
}

class Body 
{ 
     Head head;
}

public Body CreateBody()
{
    Head head = new Head(); 
    Body body = new Body{ Head = head };
     //head.Body is null!!!
     head.Body = body;
     //world-order is restored
     return body;
}

One particular case are ORMs entities, that are not really 'created' till they are saved in the database:

class Invoice 
{
     Customer Customer { get; set;} 
     DateTime CreationDate { get; set; } = DateTime.Now;

     DateTime? CancellationDate { get; set; }
     User? CancelledBy { get; set; }
}

In this example, is convenient to write just Customer to mean that CustomerID column in the Invoice table is non-nullable, even if when the Invoice is created is temporally null, until the user picks one customer and saves the entity. Having to create a constructor with one parameter for each non-nullable referenced entity will be a pain.

NightElfik commented 9 years ago

@olmobrutall While I really like your write-up, great work! However, I have to disagree with what you suggested, mostly with the second part.

The "exceptions" from the rule to achieve convenience are only making the whole rule less useful.

When there are two classes with cyclic dependencies, at leas one of them has to have a nullable reference. There should be no other way around. If you want to have nice interface, just make it using non-nullable properties.

class Head { 
    private Body? m_body;
    public Body {
        get {
            if (m_body == null) { throw new InvalidOperationException(); }
            return (Body)m_body;
        }
        set { m_body = value; } 
    }
}

class Body { 
    private Head? m_head;
    public Head {
        get {
            if (m_head == null) { throw new InvalidOperationException(); }
            return (Head)m_head;
        }
        set { m_head = value; } 
    }
}

public Body CreateBody() {
    Head head = new Head(); 
    Body body = new Body();
    body.Head = head;
    head.Body = body;
    return body;
}

In fact, this could be the default implementation of properties with non-nullable types :).

And in case you want to make the classes "immutable", it's simple extension:

class ImmutableHead { 
    private Body? m_body;
    public Body {
        get {
            if (m_body == null) { throw new InvalidOperationException(); }
            return (Body)m_body;
        }
        set {
            if (m_body != null) { throw new InvalidOperationException(); }
            m_body = value;
        }
    }
}

The second example is "broken" is a sense that you can freely have "invalid" instance and pass it around. And if you want to check at some point that your instance is in fact "invalid", you will be bombarded by warnings CS0472? That is not right...

I do understand that without CLR support it will be never watertight, but making exceptions "for convenience" will never bring it close enough to make the eventual switch.

danieljohnson2 commented 9 years ago

@olmobrutall, I think your first section highlights this proposals big problem- its warnings are just wrong, because the "non nullable references" are really entirely nullable.

But your second issue, about initialization, illustrates one path out of the mess.

We could say that it is not an error for an ordinary field to be null, but it is an error to observe the null. Null then means 'uninitialized', but that's okay as long as you do initialize before use.

We could enforce this with runtime checks. Since the 'T!' syntax seems to have been rejected, let us consider a [NullSafe] attribute that can an apply at method, class, or assembly level. The marked body of code would get additional null checks compiled in- not just when dereferencing a null, but also when reading or writing one from any field not marked nullable.

You could then have this:

[NullSafe] Object example()
{
  Object[] notNullable = null; // warning here, of course
  notNullable = new Object[47]; // but this fixes it at runtime
  for(int i=1;i<47;++i) notNullable[i]=new Object(); // oops, missed one!
  return notNullable[0]; // throws exception *reading* a null
}

The warnings here are not sufficient, but the runtime check means that no caller will ever see a null come back from this method. It's runtime checking, but it does check at an earlier point that what you get today.

The thing about this approach is that Object and Object? are no longer actually different types; they are just notations for how to deal with nulls. In practice it becomes like C++ const; you can cast it away and cheat, thus:

Object[] notNullable = { 1, 2, 3 }; // safe, no nulls here!
Object?[] nowItIsNullable = (Object?[])notNullable; // allowed with cast!
nowItIsNullable[0]=null; // allowed!
return nowItIsNullable[0]; // no runtime check, returns null

but

return notNullable[0]; // runtime check, throws instead

This is all weaksauce compared to CLR-enforced mandatory reference types, but as long as we won't change the CLR, weaksauce is the sauce we get. In the CLR these things are all nullable as heck, and we should mitigate this without pretending it isn't so.

olmobrutall commented 9 years ago

Hi @NightElfik

Thanks for your feedback. Actually my example can also be implemented with all non-nullable referenes:

class Head
{
    public Body Body { get; set; }

    public Head (Body body)
    {
        this.Body = body;  //body.Head is null at this point!!
    }
}

class Body
{
    public Head Head { get; set; }

    public Body()
    {
        this.Head = new Head(this);
    }
}

Still, this solution has two problems:

  1. Even if it looks watertight, in the Head constructor you could observe a head-less body.
  2. In general, you have to add a constructor parameter for each non-nullable member, making POCOs so much more complicated.

You can choose to complicate the code using custom property setters (your example), or custom constructors (my example), but at the end I'm not sure if it's worth.

Watertight languages are usually functional languages with algebraic data types (F#, Haskell), or a very complicated constructor rules are necessary (Swift).

In C# your are usually using POCOs, DataBinding and ORMs, and making the not nullability too strict could create more problems than it solves, backwards-compatibility apart.

I think our experience with generics (where JAVA uses type erasure and C# gives us strong guarantees backed in the CLR with better results) is misleading us: There is no good reason to store an int in a List<string>, and it's easy to enforce, but there are lots of reasons to have an initial null on a variable/array/property that, on the long run, should be expected to be not null.

Take Typescript as an example. The type system is a fantasy and can be easily cheated but the type annotations help programmers express intent and that's enough to remove most of the errors. Maybe this time being strict is not what C# needs.

fschmied commented 9 years ago

As I menationed earlier, I'm on a team that has been using "implicitly non-nullable reference types" for method parameters via a ReSharper plugin for some time. We're currently considering enabling that feature also for return values, and I'd like to share our experience with this matter.

ReSharper has nullability control flow analysis and nullability annotations: it allows parameters, methods (i.e., return values), and fields to be annotated as "can be null" and "not null" via custom attributes. This information is then used to warn when a (provable) nullable value is used in a place where "not null" is expected. And it is used to suggest removing null checks when a value is provable not null based on these annotations.

There are two drawbacks with how this is implemented by ReSharper:

In my opinion, this second point - that you cannot rely on the non-nullability, i.e., it's not truly contract, is a real problem. When non-nullability is a 95% thing only, you still need to check every non-nullable value for being null before you do anything with it if you want to avoid errors caused by null values. And it's worse than before: ReSharper will suggest you remove those null checks! If you do remove them, you can get null-related errors - maybe near the caller who supplied null, maybe not. Maybe down the callstack, maybe up the callstack. Maybe two months later because the invalid null value was written into a database and caused an error only when it was read again.

That's why, in our case, we haven't considered implicit non-nullability for fields yet, and have combined non-nullability of parameters and method return values with automatically generated runtime checks via https://github.com/Fody/NullGuard. This makes the otherwise heuristical feature safe because if some code breaks the contracts, e.g., due to unannotated code or a null value escaping the incomplete control flow analysis, it is stopped at a defined point in the code. And in contrast to performing runtime checks whenever a non-nullable item is read, it's much less expensive and often nearer to the place that caused the null value to leak into the non-nullable context.

So, in essence, I'd highly suggest combining the feature in this proposal with automatically generated runtime checks in defined places. For example, when a method is called (for arguments), when a method returns (for return values and non-nullable fields assigned in the method), and when a constructor returns (for field assignments).

Of course, I know that this is neither easy, nor perfect, especially with fields of non-nullable types:

Despite runtime checks having their own set of problems, I think that reliability is one of the most important aspects of non-nullable reference types. Without reliability, the feature could actually cause more bugs than before.

@dsaf:

Is it really that better than a set of analyzers? I guess at least it will be "standard" and come for free.

I think it's better if (and only if) implemented in a reliable way :) In that case, it's better because it has a standard syntax, it's used by many library authors (including the BCL), and it can express (non-)nullability in places such as variable declarations or cast expressions, where it can't be expressed without language support.

olmobrutall commented 9 years ago

@danieljohnson2 Thanks for your feedback!

We could say that it is not an error for an ordinary field to be null, but it is an error to observe the null. Null then means 'uninitialized', but that's okay as long as you do initialize before use.

I'm more and more thinking in relaxing the warnings for the non-nullable reference types. Let's face it, arrays, databinding, POCOs and object initializers, ORMs... all are fundamental use cases of C#, and will be benefited from a soft solution, but if you want a strong solution you have to write code the Haskell way.

Bare Minimum :+1: :+1:

Just being able to annotate types with ? to detonate intended nullability, and store this information in the metadata using a NullableAttribute, just like DynamicAttribute does it, will already provide some real value without creating any new problem:

Warnings

Let's see what warnings we can think of that will create more solutions than problems.

To not-nullable

string a = null; //implicit conversion from null to string (not-nullable)
string a = (string)null; //OK

public string Bla()
{
     return null; //implicit conversion from null to string (not-nullable)
}

string? b = "hi"; 
string a = b;  //implicit conversion from string? to string (not-nullable)
string a = (string)b; //OK

void Print(string s)
string? b = "hi";
Print(b); //implicit conversion from string? to string (not-nullable)

string? b = "hi";
b.Length //implicit conversion from string? to string (not-nullable)

While the first ones look more appealing, as you move down they become more annoying to use. I'm worried than using explicit nullable types will be so tiresome that people will just use not-nullables, since they are more friendly, familiar and can contains nulls anyway. A good casting operator makes a big difference here.

To-nullable

string a = "hi";
if(a != null) //warning CS0472: The result of the expression is always 'false' since a value of type 'int' is never equal to 'null' of type 'string?'
{

}

I think this warning should not be made, since the "always 'false'" part is just a lie. :-1:

Conditional operator

string? c = Rand()? "hi" : null //CS0173 Type of conditional expression cannot be determined because there is no implicit conversion between 'string' and '<null>'

Isn't this warning super-pedantic?

Even for 'real' Nullable<T> should have been solved since C# 2.0

int? c = Rand()? 1 : null //CS0173 Type of conditional expression cannot be determined because there is no implicit conversion between 'int' and '<null>'

Non-nullable initialization

For local variables is trivial: just like any other variable.

string a;
a.Length; //CS0165  Use of unassigned local variable 'a'

But for members is more tricky:

class Person
{ 
   public string Name;
}

var p = new Person();
p.Name.Length; 

Solving this will be harder, requiring definitely assignment analysis and will still have a lot of holes. Also will require users to create constructors or provide absurd default values. POCOs, object initializers, DataBinding and ORMs will be severely constrained if implemented.

Is better to have a soft and useful solution that communicates intent without creating too much problems, than a Haskell-wannabe that is not going to fit in the language, and has holes anyway.

Finally, for arrays the problem is impossible.

string[] str = new string[100];
//stuff hapens
str[2]; //No way we can know if this value has been initialized.

Casting operator

A simpler operator to cast to not-nullable could be helpful, especially for big generic types, but its hard to come up with a good syntax.

!Customers!.Any(c => !c.Name!.Length != 0)
!((!)Customers).Any(c => !((!)c.Name).Length != 0)
!notnull(Customers).Any(c => !notnull(c.Name).Length != 0)
!Customers.Value.Any(c => !c.Name.Value.Length != 0)
!((List<CustomerEntity>)Customers).Any(c => !((string)c.Name).Length != 0)
!Customers.Any(c => !c.Name.Length != 0)

as operator

I've just come up with one reflection about as operator:

object obj = "string";
string s = (string)obj; //OK
string s = obj as string; //Warning! now as returns string?
olmobrutall commented 9 years ago

@fschmied, how are you encoding POCOs with your Resharper solution in practice?

Also, when you enabled NullGuard, how many real bugs you found and how many you create?

I assume that you and you're team have a lot of pride for your code and take the necessary time to improve it, but other shops with a more... client/provider relationship with MS will be disappointed if VS2017 introduces 400 new warnings, and 5 new bugs of it's own. Even if is helping to prevent dozens of current bugs and hundred of future ones.

fschmied commented 9 years ago

@olmobrutall We don't, since we use implicit not-null and NullGuard only for parameter and return types.

Also, when you enabled NullGuard, how many real bugs you found and how many you create?

For non-nullable parameter types, it's hard to say as we already hat manual argument checks in place. I can say, though, that there are often situations while writing code where the nullability analysis warns us of a potential bug. How many of those would've escaped our attention - I don't know. We haven't seen a new bug introduced by this tooling during the last 10 months.

By enabling non-nullable return types on our existing code base, we've found a few actual issues where we didn't handle null values correctly. We don't know if we created any new bugs yet - our experience with implicitly non-nullable parameter types causes me to hope that we didn't.

In general, the experience was very smooth - we had to add a number of annotations and assertions, but it wasn't that much work actually. But yes, you're right, we have a very high test coverage and put a lot of thought into the quality of our code, so we're likely not a representative team for the whole customer base of Visual Studio. Without coverage, I'd see turning those features on for an existing code base as quite risky. I don't see as much risk on a new code base, though.

ndykman commented 9 years ago

While I get the demand for this feature, the myriad complications just don't seem worthwhile versus a CodeContracts approach or going wholesale and introducing records (ala F#). It seems that for interoperability with existing code libraries you have to treat a T! into a T at some point and you are back to static analysis to determine if a instance of T is never null.

Sure, nulls are annoying, but there are plenty of special cases that are also annoying. Take collections that don't allow indexes < 0. There's a lot of those. Do we need to introduce a int+, long+ syntax for those cases? And so on. Sure, Dependent Types are great, but we know that's an unsolvable problem in general. I'd much rather the emphasis be placed on static analysis and contract rewriting. While I know that the CodeContracts project comes with it's own issues, it's really a more general way of addressing these issues in a standard way.

yaakov-h commented 9 years ago

Take collections that don't allow indexes < 0. There's a lot of those. Do we need to introduce a int+, long+ syntax for those cases?

Well, we do have unsigned integer types, which - ignoring their C-based legacy - is effectively a way of specifying a very particular constraint on a number.

While I know that the CodeContracts project comes with it's own issues, it's really a more general way of addressing these issues in a standard way.

Code Contracts is non-standard, not always enforceable, and requires third-party tooling for both the producer and the consumer. It's effectively an optional add-on for .NET.

Integrating into the language and compiler toolchain would make it much simpler to to declare and comply with a particular API's nullability rules, and would likely result in safer code for everyone.

Joe4evr commented 9 years ago

how are you encoding POCOs with your Resharper solution in practice?

  • You add the constructor?
  • Make the properties nullable?

This is really the crux of the matter, IMO. I'm relatively sure static null analysis is gonna break down around POCOs, and thus, many application/database models will not co-operate very well. A constructor is near useless, because these POCOs are pretty much always instantiated via Object Initialization or Reflection. Making all properties nullable would work, but then you get into the same Option hell I've read about Swift. Plus, this will just look incredibly silly if you ask me

[Required]
public string? FirstName { get; set; } //Even if you get what it means, it still looks silly

The main reason given for this feature is "to express developer intent", but I say that this example does not express the developer intent too well, if at all.

And the size of these models could make it a real chore. I've recently been working on a new project and the design documents lists almost three dozen tables with many of them having more than a dozen or two dozen fields. And this might even be considered an average sized model.

If you ask me, >90% of the value of this compile-time nullability checking will be at locals and method parameters. On class properties, it would also be a very good fit on classes that do something and have a constructor that sets these properties up (like a Controller), but not on POCOs which just need to hold data.

Because of this, I'd like to suggest adding a new attribute, named maybe [ImplicitlyNullableProperties] or something similar, and the compiler would behave as if all reference type properties of the class it's applied to are declared with nullable behavior when nullability checking is enabled.

[ImplicitlyNullableProperties]
public class Person
{
    [Key]
    public int Id { get; set; }
    //nothing happens around struct properties,
    //since they have a default value to begin with

    [Required]
    public string FirstName { get; set; }
    //compiler behaves as if this was declared as
    //a string? and gives the appropriate 'could be null'
    //info/warning to the dev at call-sites
}

If you're talking about developer intent, this would display the intent very clearly, while still having the compiler co-operate with the dev (instead of working against them) and not looking as silly as the "required-nullable property" example.

yaakov-h commented 9 years ago

I don't see any advantage in an [ImplicitlyNullableProperties] attribute over simply declaring each property as nullable.

I also don't see the problem with required-nullable. That simply states that whilst the property can be null at any arbitrary point during the execution of the program, at a certain point (i.e. whatever validates the POCO based on attributes) that value must be non-null. Attributes are not part of the class, they're just annotations. Thus, it's exactly the same as it is today where all reference types are implicitly nullable, except the developer (or an auto-generating tool) has explicitly marked it as nullable.

ndepend commented 9 years ago

@ndykman While I know that the CodeContracts project comes with it's own issues, it's really a more general way of addressing these issues in a standard way.

I'd say forget about Microsoft Code Contracts library. We did the mistake to rely on this library, and now we are stuck with this buggy library that just don't work with Visual Studio 2015, more than 3 months after it turned RTM!

have a look at critics, we are not the only ones in this sad situation https://visualstudiogallery.msdn.microsoft.com/1ec7db13-3363-46c9-851f-1ce455f66970

roji commented 9 years ago

@psmacchia, CodeContracts does have VS2015 support and works well, check out their github releases page. Whether it's a good idea to start off a new project with CC is another question, as the project doesn't seem to be a very high priority for Microsoft.

yaakov-h commented 9 years ago

@psmacchia I had the same issue, and had to maintain a private fork to get a proper VS2015-compatible release.

@roji It claims to have VS2015 support, but it has quite a few VS2015- and Roslyn-related bugs. Some have been fixed in master, but there's no new release yet.

Either way, I don't think it makes sense from any perspective to fob this issue off to Code Contracts or another external party. This ought to be a core feature.

ndepend commented 9 years ago

@roji

https://github.com/Microsoft/CodeContracts/issues/169#issuecomment-149185713

7 days ago I wrote

Nevertheless installing this fixed version still ends up in VS2015 with the error:*

The command ""C:\Program Files (x86)\Microsoft\Contracts\Bin\ccdocgen.exe" "@obj\Debug\MyDllThatUsesMSCCccdocgen.rsp"" exited with code 255.    MyDllThatUsesMSCC

@tom-englert wrote: there is no fix committed yet for this issue,,,

@yaakov-h This ought to be a core feature. I 100% agree,

The 28/07/2015 public version should not claim it supports VS2015 because it just not, this would save a lot of time and disappointment to users.

https://visualstudiogallery.msdn.microsoft.com/1ec7db13-3363-46c9-851f-1ce455f66970

tom-englert commented 9 years ago

@psmacchia If you like to support the projects you could help with testing: https://github.com/tom-englert/CodeContracts/tree/Issue169-3 This now should contain the fix.

fschmied commented 9 years ago

See also #119, a proposal to bring method contracts into the language.

olmobrutall commented 9 years ago

@yaakov-h, I also don't see any point in [ImplicitNullabePropertiesAttribute], it's a bad solution but the problem remains:

If is encouraged to create POCOs with not-Nullable reference types then a non-Nullable reference type will mean just "I promise I'll fill this property... If I've time".

It is a very week promise, but is probably the best solution nevertheless, a more strong promise, even based on warnings, could be annoying to use.

ndepend commented 9 years ago

@tom-englert I tried again without success, see my answer on the thread https://github.com/Microsoft/CodeContracts/issues/169

yaakov-h commented 9 years ago

@olmobrutall the issue I have with enforcing non-null on POCO properties is that they can often be in an invalid state when new, or in the middle of a calculation. It's one of the few places I find the "implicitly unwrapped optional" in Swift to be a good construct.

olmobrutall commented 9 years ago

Hey! I didn't know that swift has the concept of implicitly unwrapped optionals. Of course the guarantees for optionals in swift are going to be more strong so they need the concept more.

I see three options for C#:

Follow swift lead:

Invert swift solution

Week and simple solution: Take into advantage the fact that the solution is not going to be watertight anyway, to simplify the solution, T will be permissive not-Nullable and the number of warnings will be reduced globally.

yaakov-h commented 9 years ago

There is an implicit contact between the library and the consumer that the thing should be null

I think you meant either "should not be null", or "can be null".

The issue I have with the "Invert swift solution" is that as more and more C# code becomes null-safe over time, ! will be absolutely everywhere.

danieljohnson2 commented 9 years ago

@olmobrutall, I don't think there's going to be much support for the Swift way; that adds a new type distinction (and probably requires CLR changes because of it), yet it remains incompatible with essentially all existing C# code- worst of both worlds. I'm afraid.

The 'inverted Swift' way is more viable, but there are serious technical obstacles. Honestly @MadsTorgersen proposal at the top is problematic also: T? for reference types probably will require some CLR support. Even Nullable<S> required CLR support, after all. But T! would be worse, since that type has no natural default value.

yaakov-h commented 9 years ago

@danieljohnson2 the "Swift way" was already discussed in #5033 as an option, including "This seems like a good idea." There's some support already.

T! has a natural default - null, the same as T?. T has no natural default though, best I can tell - what would that be? I dare say you would be required to set it in a constructor or inline assignment (readonly T thing = new Thing()), or it would follow the same initialisation rules as a struct, which may break compatibility.

scottsteesy commented 9 years ago

I support the idea of a non-nullable type, and on variables of that type, but I do not think that being able to say you want a variable of just any type to be non-nullable is realistic. For a non-nullable type all we need is a static constructor MyType!() which will define the default value for the non-nullable type. And just doing so could conceivably automatically create a property named Default that returns that value. Then because the type is now non-nullable compatible if you define a variable of MyType if would have an initial value of null, but a variable of MyType! would get an initial value equal to the reference returned by the special constructor. This will obviously require CLR support to have the special constructor called, just as the class's static constructor does now. Code that is non-nullable unaware would consume all code same as today, because it will just use nullable variables. Conversely, with non-nullable aware code consuming a regular old nullable class we could either have something like NonNullable which could work on any class that has a default constructor, to wrap it in a lightweight non-nullable class, or the developer would have to write their own wrapping class in order to make a non-nullable compatible type.

danieljohnson2 commented 9 years ago

I think we're in for a lot confusion with the T! syntax; Swift uses this for a "nullable T with dynamic checks", which is just T in current C#. But previous discussion has used T! for "mandatory T".

The big problem with a real "mandatory T", however you spell it, is that it has no reasonable default. The CLR thinks it can zero initialize things like structure and arrays and, by doing so, set everything to its default value.

It would be a big thing to make the CLR initialize certain arrays and fields to some other value, and it would mean that mandatory references can exist only for special types that define these default values. Many classes couldn't really do that. This would really limit the usefulness of the feature, while adding a big pile of complexity.

If you allow "mandatory T" for any reference type T, you have the hard problems- how do you ensure initialization for arrays, structures, and finalizers? If the answer is "you don't", then your "mandatory T" isn't really mandatory anymore.

I still favor an approach that restricts these things to places where we could really enforce it, but that's a pretty big restriction in its own right.

qrli commented 9 years ago

I think, non-nullable fields/variables are not a must. The top usage of non-nullable is method/property signature and the code analysis atop of it.

As long as we have it, the nullability of related local variables and fields can be deducted by code analysis, so that we don't have to add special syntax to them.e.g.:

string! GetFoo() { ... }
...
var str = GetFoo();
var parts = str.Split(','); // compiler/analysis-tool knows str is not null.

This could be a small and easier step to start with. And it also keeps most of C# code as it is.

yaakov-h commented 9 years ago

@qrli That already exists in the form of Code Contracts, but it's nowhere near as neat a solution as the proposition detailed here to allow variables/parameters to be declared as "definitely some sort of value", as opposed to the traditional "some sort of value, or maybe nothing. Who knows?"

It's a smaller step, it's an easier step, it's been done before, and it's not good enough for massively-large systems.

qrli commented 9 years ago

@yaakov-h IMO, the issues with Code Contracts are mostly with its library approach and the tools, and it has many more complicated features which are not sweet yet because of its larger goal.

While at the same time, Resharper's simpler annotation approach gains more practical usage. So, it is one way that we can get the relative mature part into production, leaving other features for future improvement.

The similarity to Code Contracts is also a good thing, because the BCL is already annotated with contracts, which can be used directly, instead of modifying whole BCL to support non-nullable.