Closed Neil65 closed 7 years ago
@lukasf I believe that's already been proposed a couple of times. #3330
This has two problems. One, it is a breaking change, even if only eventually. Once a project is converted no new code could be added from outside sources without having to go through a migration process.
Second, this creates a new dialect of the language, which can only be a source of confusion. For a long time the vast, vast majority of public information about C# will note that any non-decorated reference type is nullable.
Symmetry is nice but I don't think it's worth overhauling one of the core aspects of the language.
@HaloFour Maybe you are right about this creating a new dialect of the language. Unfortunately. I would still love to see this feature. I guess I will have to wait for a next gen language to bring this as a core feature (maybe the experimental M# language we heard about).
I have also had some thoughts about nullability & C#/.NET. I’m honestly not sure whether this should be a comment to this existing thread or a brand new enhancement request, but if anyone’s interested, I’ve written it up here:
There are a few areas around backward-compatability which I still need to write down.
My basic design (adding non-nullable and explicitly-nullable references) is the same as @Neil65’s. The main differences are:
@bunsen32's proposal makes me rather feel that, even we could rewrite the CLR itself, we'd still be in a lot of trouble combining mandatory references with generics freely. Not knowing if a type variable T has a default value is going to create a lot of weird corner cases.
It might be better to have a stricter rule: no generic type variable can ever be realized with a mandatory reference type, so List<string!> is not allowed at all, ever.
Instead, you would need to create a 'mandatory reference list' type, like this:
public class NotNullList<T> : IList<T>
where T : class
{
public T! this[int index] { .... }
T IList<T>.this[int index] { .... }
}
So you can apply ! to a type variable to a mandatory version of it, but we would know that any type variable unadorned is nullable (or at least has a default- default(T) is always ok, default(T!) Is not).
Yeah, I proposed a new generic type constraint (which uses the ‘default’ keyword) to indicate that a type parameter has a default value… then made it the (ahem) default, since it’s what existing generic code expects.
It would be a great waste not to allow mandatory references as type parameters in the general case, though. We must forbid them as type parameters to legacy generic classes, but new generic code can be written without the assumption that there is a ‘default(T)’.
I think non-nullable reference types and generics should work just fine in practice. List
The main problem that I see is the need to deprecate FirstOrDefault/SingleOrDefault/LastOrDefault.
I think non-nullable reference types and generics should work just fine in practice
Generic code has to be written to handle mandatory references, and in some cases CLR support is required.
Consider the following:
public T Foo<T>(T input) where T : ICloneable
{
return (T)input.Clone();
}
Exisiting code with a new()
constraint will break as well since it returns a new instance of the wrapper, which in all cases will be an invalid mandatory reference rather than a new instance of the wrapped type.
The problem with generic type variables inhabited by mandatory references is that existing framework classes would break. Consider List<T>
; this contains a T[]
, and when you shrink the list it clears the trailing elements of that array. It must do this so the objects formerly in those elements can be garbage collected.
But if T is, say, Object!
, what then? If Array.Clear
is a no-op, it will leak memory. If it is actually going to null-out the array elements, those elements are not very mandatory after all.
@olmobrutall, why do FirstOrDefault
/SingleOrDefault
/LastOrDefault
need to be deprecated?
I think we would need two List<T>
implementations, one for nullables (List<T> where T : default
) and one for manatory types (MandatoryList<T> where T : mandatory
). Generic type constraints would prevent you from using the wrong list type. The one for mandatory types would need a different implementation. Internally it would probably use an array of T? (nullable array of the type) to allow fast grow and shrink of the list. On access it would get the real (non-null) value from the nullable array. Both list types would implement IList<T>
since the interface does not have any methods that would break on mandatory types.
The current FirstOrDefault,... methods would get a type constraint so they can only be used on enumerations with nullable types (e.g. "where T : default
"). They don't need to be deprecated, but obviously they can't work with mandatory types. But we could add a new FirstOrDefault() for mandatory types (e.g. "public T? FirstOrDefault<T>(this IEnumerable<T>) where T : mandatory
"). This way you can also use FirstOrDefault() with mandatory Enumerations, which will return a nullable result of course.
You are focusing on a single case where it breaks down, but mandatory types breaks generics all over the place.
Activator.CreateInstance<T>()
- Broken by mandatory types.System.Runtime.Marshal.DelegateForFunctionPointer<T>(IntPtr)
- Broken by mandatory types. T Foo<T>() where T : new()
- Broken by mandatory types.T Foo<T>(T v) where T: ICloneable { return (T)v.Clone(); }
- Broken by mandatory types.Can you elaborate a bit on why exactly they break? On first sight, I do not see how any of these would cause problems. Activator.CreateInstance
Activator.CreateInstance<T>()
will fail because the default constructor for any mandatory value will return a mandatory wrapper with a null value.
GetDelegateForFunctionPointer<T>
will fail because it requires that T is a delegate type, but Action!
for example is not a delegate type, but it seems like a reasonable return type for the function. But it's not.
where T : new()
breaks for the same reason as Activator.CreateInstance<T>()
: the result of a new mandatory reference is null.
where T: ICloneable
fails because mandatory types will never have ICloneable
- it's the contained type that can have ICloneable
and in that case the run time has to be able to delegate that to the reference rather than the mandatory wrapper.
So we end up with a feature (mandatory types as generic type arguments) that breaks in so many cases that it seems completely ridiculous to allow it.
You assume that mandatory types will be implemented by some kind of wrapper around a non-mandatory type. Only based on that assumption, these cases will fail. But I don't think that this is how mandatory types would be implemented. Especially because using wrappers would put a considerable negative performance impact on usage of mandatory types.
This will be implemented as a language / compiler feature: The language just won't allow you to write code where a mandatory type is null. And due to that, there is no need for "real" null checks on these values. They can never be null, so they can always be accessed directly, without the need for a null check through some wrapper. Internally, they will be treated like normal nullable references. Runtime changes might be needed as well, but I see this mainly as a compiler / language feature. And I am pretty sure that it can be accomplished without the need of any kind of wrapper object. The generated code probably won't differentiate between nullable and non-nullable references.
The language just won't allow you to write code where a mandatory type is null.
Except when it does:
public T Foo<T>()
{
return default(T);
}
Foo<string!>()
here will return null and there is nothing the compiler can do about it (except not allow mandatory types in generics).
The issue here is that generic code is written with the assumption that reference types can be null. By using mandatory types you are applying a constraint to generic code that was assumed non-existent at the time of writing. If a mandatory constraint for generics should exist, it must be applied by the generic code site, not the caller.
The compiler knows that T is mandatory, so this will just not compile. Much like List<T>
won't work. You need to use MandatoryList<T>
instead. Not all generic code breaks, but all generics that use default(T)
will not work with mandatory types. FirstOrDefault<T>()
will not work with mandatory values, but FirstOrDefault<T>(T defaultValue)
will. All generics in the framework should contain new constraints to reflect the limited use.
Please remember that when you use a generic class with a specific type, then the compiler will compile code for this specific type. So Foo<string>
will have a different implementation than Foo<string!>
. Any problems with generics and mandatory types will be caught during compile time. And if you dynamically create generic types during runtime, then the JIT compiler will compile and emit new code, so errors when using generics dynamically would cause errors at runtime. Even if you use old generic implementations (without constraints) with new mandatory types, you would get errors when using them, often during compile time and latest at runtime.
@GeirGrusom, you’re right: existing generic code is written to assume that all types have a default value. If we invent non-nullable reference types, ‘has a default value’ becomes a constraint on generic type arguments (and one which newer generic code probably wants to be able to relax). I can’t understand your other “broken by mandatory types” examples though! (You seem to be introducing a wrapper struct in order to break things!)
@lukasf, There’s no reason that the .NET framework type List<T> shouldn’t be rewritten if non-nullable references are introduced, rewritten in such a way that it can allow nullable and non-nullable type parameters alike. It would be profoundly disruptive if client code had to use one type of list for nullable references and structs, and another type of list for non-nullable references. I’ve outlined in a blog post how List<T> could be modified to no longer require ‘default’: https://dysphoria.net/2015/06/16/nullable-reference-types-in-c-generics/
@danieljohnson2 Yes, there needs to be some kind of way of dealing with mutable arrays of non-nullable references… and the collection classes, for example, need to be able to ‘unset’ them in order to drop references so that they don’t leak memory. I think the only way for the CLR to allow that—and also to deal with the issue of non-nullable fields in structs whose constructor has never been run—is to allow fields and array elements (of non-nullable references) to be ‘uninitialised’.
If you try to access an uninitialised field/array-element, you get an exception (so collection classes would need to be designed not to read from uninitialised array elements). Array.Clear would set elements back to ‘uninitialised’.
@bunsen32
Making List<T>
capable of understanding and enforcing non-nullable types is quite impractical for a number of reasons.
For starters, non-nullable types aren't going to be a new form of type according to the runtime. A List<string!>
(assuming a !
syntax) is the same as a List<string>
and the runtime is none-the-wiser. As the IL needs to be identical for both the container cannot selectively enforce different rules. Assuming that a new generic constraint would be added to also enforce non-nullability generic constraints aren't selectively enforced by the consumer of the generic type, so List<T>
couldn't both be nullable and non-nullable.
A lot of this discussion has been obsoleted by #3910 where the proposal is to provide little more than attribute-based decoration to denote the nullability of parameters and static/flow analysis to provide enforcement. The CLR enhancements necessary to make it possible to prevent null
array elements or to enforce non-nullability in a generic container aren't on the horizon.
@bunsen32, I think I understand your proposal better now. You would leave a hole in the type system: default(S)
for a struct S (that contains a mandatory field) would generate a mandatory reference that is null; assigning this to something would not involve a runtime null check on this field (I expected a runtime check there!).
That'll do the trick, but it seems to me to be too large a hole in the type system, It will be very easy to accidentally introduce nulls in fields that are apparently "not nullable". I still think that structs can't reasonably contain mandatory fields, unless mandatory just means 'null-checked on read'. If it does mean only that, that check needs to be present at every relevant assignment. If null values in mandatory fields are allowed to propagate unchecked, the feature doesn't really add much to what we have now.
Anyway, if CLR enhancements are off the table your proposal is not implementable, and it might be wiser to wait to see how this stuff plays out with Apple's Swift language.
@HaloFour, yes, my proposal would require a change to the runtime, and non-nullable (and Nullable<>) would have to be separate runtime types. (It would be a change of the order of the change to introduce generic types in .NET 2.) I do hope the .NET team consider something similar for a future release.
@danieljohnson2, Yeeees, it's not quite a hole in the type system: it's still type safe, but if you attempt to read a non-null reference from an unassigned slot, it would throw an exception. I think it's an acceptably-sized hole!
@bunsen32, I think it is a way to read a null out of an unassigned slot without a check. Consider this:
struct S { public Object! Value };
S containsNull = default(S);
S dest;
dest.Value = containsNull.Value; // checked: this throws an exception
dest = conatinsNull; // unchecked: this copies a null
The behavior I had expected here was that the two assignments would be equivalent (and both checked!). I think this would be a nasty trap for programmers; the difference between the two cases is not obvious at the point of use.
I'm not a big fan of warnings: just make it an error to have a mandatory field in a struct and you've closed this hole entirely. If we must have a way to de-initialize a mandatory field, that should be an explicit syntax; or perhaps a method like this:
Mandatory.Deinitialize(out victim);
Which is implemented in IL, ignores the fact that victim is supposed to be mandatory, and nulls it out. This has the advantage that you can search for it, and review all uses. The trick with a default-valued structure is really not something you can search for.
@bunsen32 what you are suggesting will be covered by #119. No need to add special casing for null values.
The advantage of building it into the type system is that null checks can be omitted by the CLR if the CLR adds support for mandatory types since not-null validation is preserved across a value copy. Copy from an annotated field provides no such guarantee.
In either version old style generics cannot support mandatory types unless we want a compiler that can easily contradict itself, and in my opinion adding it to the type system actually solves something that code contracts do not.
Heh, @danieljohnson2, that Deinitialize
method opens up another hole :)
It does seem quite appealingly symmetrical to allow deinitializing memory slots (and you could define it to work on any type, so for types where default
is defined, it would reset the value to default
). However, is there any way to limit an out parameter to only apply to fields and array elements? Because we don't want to allow 'deinitializing' parameters and local variables!
Disallowing fields of structs to be mandatory references seems unduly restrictive to me. Especially since you might be writing generic code and not know the exact type of your field. It would make the definition of Nullable<>
itself quite tricky! (The compiler could potentially disallow mandatory fields where the type is explicit (not generic) though—we'd assume that authors of generic code know what they're doing.)
I don't like the idea of uninitialized fields. It opens up a huge loophole. If code that uses mandatory references cannot guarantee that the code is really safe from NullReferenceExceptions, then the concept fails and you could as well continue working with normal nullable references instead...
@lukasf, You can’t avoid uninitialized fields completely without changing the C# language and .NET runtime quite radically. ‘readonly’ fields have a very similar problem when you’re talking about class fields. If constructors allow their object to escape into a global variable, or another thread, for example, all bets are off as to whether the object has been correctly initialised/constructed. This is a general ‘issue’ with .NET. And then there’s arrays…
However, I disagree that this is a show-stopper. Allowing some mandatory/non-nullable references in some cases (cases which can be warned about by static analysis tools, and avoided by reasonable coding practices), is the trade-off to allow non-nullable references to be enforced in the great majority of (sanely-written) C#.
In addition, an ‘unassigned field’ doesn’t return a null; it throws an exception whenever it’s directly accessed. This is an improvement upon a NullReferenceException
since it fails-faster: whereas nulls propagate through the program and can cause an exception later; unassigned fields being accessed cause an exception there and then. They’re amenable to easier static analysis, and easier debugging.
Sorry if this has already been mentioned, but can these nullable/non-nullable semantics also be applied to method return types and method parameters, e.g.:
public Dog! GetDog(string! name)
{
// return SomeMethodThatReturnsDog(); // Could fail if general reference or nullable reference?
// return null; // Will fail to compile
return new Dog(name); // Ok
}
//var dog = GetDog(null); // Will fail, can't pass null reference
var dog = GetDog("Banjo");
dog.Bark(); // Ok because we know the method retuned Dog!
@Antaris I should expect parameters and return types to be covered in any proposal- those are the easy cases, because you can always enforce them at runtime by compiling extra null checks into method bodies. Local variables are similarly straightforward.
Array elements and class fields are the trouble - they can be altered without any obvious place to stick a runtime null check. You'd need to inject checks in any code that accesses them, even if you aren't compiling that code. This is why building this feature in the CLR would be more effective- it can apply checks to all code.
Struct fields are the worst; they have all the problems class fields do, plus default(S) gives you a struct full of nulls. Even if we get to rewrite the CLR itself, it's not obvious how to deal with this.
Perhaps look at how eiffel [The Bertrand Meyer Design-By-Contract(trademark-or-whatever) language] solved it. Look up eiffel and either void-safety or CAP (certified attachment patterns). i think they proved that their way of doing it is sound (ie. their [enforced] safe/attached references are safe and after any applicable check the [possibly] detached reference is also considered safe). Of course, it being eiffel, they have a lot smaller language and some slight limitations to come with it.
Just want to add that adding a "!" type modifier would probably be best done via an optional modifier (modopt in CIL). This needs better reflection for optional and required modifiers, which .NET kinda lacks.
AFAIK non-nullable types now exists in new C# 7.0. So my question is: for example i wrote a library using C# 7.0 with non-nullable types, and then publish it. My friend downloads it, and want to pass a null using C# 6.0 for example. What happens? It just doesn't compile for unknown reason? Becuase C# 6.0 knows nothing about this keyword. Or it inserts tons of if (arg == null) throw new ArgumentNullException(nameof(arg))? Becuase second approach leads to significant performance impact. I'd like this feature to be a compile-time one, without extra runtime checks. Of course, we have branch-prediction, but it looks ugly anyway. if it is.
@Pzixel
Non-nullable reference types is being delayed to C# 8.0 or later. Any answer could potentially change.
However, based on the proposals for this feature at this time, to a down-level compiler (or one that simply didn't understand non-nullable references) the arguments would appear as normal nullable reference types. That compiler could generate code that passed null
. There would also be no automatically-inserted null checks, so performance would not be affected, but neither would there be any guarantees that the value is not null
within the method.
@HaloFour but it's weird. I abolutely sure that I don't want to check is passed value is null when I said that i's not null. Imagine the code
public static void Foo(IBar! bar)
{
bar.Bark();
}
it would be VERY strange if I get a NullReferenceException here.
I see only two possibilities here: compiler automaticly adds not-null checks everythere in the code, or it just will be CLR-feature, so it won't be possible to reference C# 8.0 (or whatever version) from below one.
I think they will use the first approach, becuase as I said, there is branch predictor, so extra-check for not-null will be predicted and skipped for most of time, and it also makes them easier to implement it: for example, if we leave it as compiler-feature, it's hard to make reflection work fine with it. And if we have runtime checks in methods, we have nothing to do with reflection.
@Pzixel
I am only relaying the proposal as it is currently. Both of those approaches have already been discussed as, as of now, neither are being implemented. This will be purely a compiler/analyzer feature. It won't even result in compiler errors, just warnings, which can be disabled and intentionally worked around.
I believe the latest version of this proposal is here: #5032
As mentioned, this is at least an additional C# version out, so it's all subject to change.
@Pzixel I assume (hope) that IBar!
would be implemented as a different underlying type than IBar
, and so it would never even be an issue. (Kind of like how int
and Nullable<int>
are different underlying types, and the compiler just allows for nice syntactic sugar.) Putting a null check in that method would be akin to making adding a check that an argument of type int
is not actually a string.
@MikeyBurkman
Actually, the non-nullable version would be IBar
, and the nullable version would be IBar?
. The only difference between the two would be an attribute, they would be the same underlying type.
@HaloFour I don't know if T? is good syntax because it breaks down entire existing code. No, i'm totally agree that it's more consistent, than mixing ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code. And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null. If i want a warning - i can use [NotNull] attribute, not introducing a whole new type. And it makes sense.
@Pzixel
I don't think T? is good syntax because it breaks all existing code. No, i'm totally agree that it's more consistent, than ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code
I've already made that argument, but it seems that this is the direction that the team wants to go anyway. I believe that the justification is that the vast majority case is wanting a non-nullable type so having to explicitly decorate them would lead to a lot of unnecessary noise. Pull the band-aid once.
And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null.
Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.
I'm pretty sure you're going to have to break backwards compatibility anyways, or make type inference useless.
// Pretend this is some legacy code
var x = new IBar(); // Line A
...
x = null; // Line B
What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.
Either devs won't used non-null types, or devs will stop using type inference. I can imagine which of those two options will win out...
@HaloFour
Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.
You should rehash nothing, basically i only want to get type mismatch when I get it insead of warnings and so on. It will emit checks or it will be a compiler feature - that is topic to speak, but if we are talking about interface - type mismatch - it's defenitly should be an error.
@MikeyBurkman re
What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.
Local variables will have a nullable type state that can be different from one point in the program to another, based on the flow of the code. The state at any given point can be computed by flow analysis. You won't need to use nullable annotations on local variables, because it can be inferred.
@gafter var is used to infer type in point of declaration, we shouldn't analyze any flow after.
@Pzixel "Nullability" isn't being treated as a separate type, it's a hint to the compiler. The flow analysis is intentional to prevent the need for extraneous casts when the compiler can be sure that the value would not be null
, e.g.:
public int? GetLength(string? s) {
if (s == null) {
return null;
}
// because of the previous null check the compiler knows
// that the variable s cannot be null here so it will not
// warn about the dereference
return s.Length;
}
So I'm still a bit confused. @gafter's comment insinuated that flow analysis would go upwards, while @HaloFour's example demonstrates it going downwards. Downwards flow analysis would be pretty much required if any implementation, and in fact R# already does that sort of analysis with the [NotNull] attributes. However, without the upwards flow analysis, I don't think type inference would be able to provide much benefit, unless breaking backwards compatibility was an option.
@HaloFour int and int? are completly different types. I really want the same UX for reference types. I can use attribute [NotNull], [Pure] and so on for a warning. I want to be abolutely sure that I CAN'T receive null if it is marked as not null. So in provided example:
public int? GetLength(string? s) {
string notNullS = s; // compiler error: cannot implicitly cast `string?` to `string`.
return GetLength(notNullS);
}
public int GetLength(string s) {
return s.Length;
}
Of course, ideally i'd like to see something like unwrap
from Rust, but explicit cast is good enough.
@Pzixel
I want to be abolutely sure that I CAN'T receive null if it is marked as not null
Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null
can sneak in by virtue of being the default value.
Flow analysis is a compromise, one that can be fitted onto the existing run time and one that can work with a language that has 15 years of legacy that it needs to support. It follows the Eiffel route, know where you can't make your guarantees and solve through flow analysis. Even then, sometimes the developer can (and should) override.
@MikeyBurkman
IIRC the type inferred by var
is neither necessarily nullable or non-nullable, it's a superposition of both potential states depending on how the variable is used. From @gafter's comment it sounds like that applies to any local even if the type is explicitly stated, e.g.:
string s2 = null; // no warning?
int i1 = s1.Length; // warning of potential null dereference
string? s2 = "foo";
int i2 = s2.Length; // no warning
Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null can sneak in by virtue of being the default value.
Generics was introduced once, another major change is possible too. Nobody says that it's easy, but they have to do it to implement it properly. It's the only way to make a strong type system. All hints
and warnings
are just nothing. It can be internally null, I don't think that significant changes required to accomplish this requirements. Just another type, it's not even CLR care. Compiler just checks that type is not-null and the only way to pass null value is reflection. Thus, we need to change reflection, but it's easy too - while string
and string?
are different types, there will be type mismatch.
Now i see that it's even simpler than I thought. Just treat them as others types and that's all. Reflection throws a mismatch error in runtime, compiler does it in compile-time and everyone are happy. And it even still be compile-time feature. The only problem is with older versions of C#, but changes in reflection should be changes in runtime, so it will be feature for net .Net.
We can do compatble version with runtime checks, for example when we use .Net 4.6 and below runtime checks (if blabla != null), with .Net 4.7 we assume that reflection do its job in runtime and remove them from the code. Elegant solution.
Generics was introduced once, another major change is possible too.
Generics was additive and worked entirely within the existing semantics of the run time.
Just treat them as others types and that's all.
That "other" type can't prevent the reference type that it contains from being null
. Either it is a reference type that itself can be null
(and wastes an allocation when it's not) or it's a struct
container that contains a reference type where the default value of the struct
is that the reference is null
. Either way, you're back to square one. Furthermore, since the majority of methods within the BCL accept reference types that should be null
you're talking about a massive breaking change to all existing programs. This solution has already been proposed.
1. Overview
This is my concept for non-nullable references (and safe nullable references) in C#. I have tried to keep my points brief and clear so I hope you will be interested in having a look through my proposal.
I will begin with an extract from the C# Design Meeting Notes for Jan 21, 2015 (https://github.com/dotnet/roslyn/issues/98):
There's a long-standing request for non-nullable reference types, where the type system helps you ensure that a value can't be null, and therefore is safe to access. Importantly such a feature might go along well with proper safe nullable reference types, where you simply cannot access the members until you've checked for null.
This is my proposal for how this could be designed. The types of references in the language would be:
Important points about this proposal:
The Design Meeting Notes cite a blog post by Eric Lippert (http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types/#.VM_yZmiUe2E) which points out some of the thorny issues that arise when considering non-nullable reference types. I respond to some of his points in this post.
Here is the Dog class that is used in the examples:
2. Background
I will add a bit of context that will hopefully make the intention of the idea clearer.
I have thought about this topic on and off over the years and my thinking has been along the lines of this type of construct (with a new 'check' keyword):
The 'check' keyword does two things:
It then occurred to me that since it is easy to achieve the first objective using the existing C# language, why invent a new syntax and/or keyword just for the sake of the second objective? We can achieve the second objective by teaching the compiler to apply its rules wherever it detects this common construct:
Furthermore it occurred to me that we could extend the idea by teaching the compiler to detect other simple ways of doing null checks that already exist in the language, such as the ternary (?:) operator.
This line of thinking is developed in the explanation below.
3. Mandatory References
As the name suggests, mandatory references can never be null:
However the good thing about mandatory references is that the compiler lets us dereference them (i.e. use their methods and properties) any time we want, because it knows at compile time that a null reference exception is impossible:
(See my additional post for more details.)
4. Nullable References
As the name suggests, nullable references can be null:
However the compiler will not allow us (except in circumstances described later) to dereference nullable references, as it can't guarantee that the reference won't be null at runtime:
This may make nullable references sound pretty useless, but there are further details to follow.
5. General References
General references are the references that C# has always had. Nothing is changed about them.
6. Using Nullable References
So if you can't call methods or access properties on a nullable reference, what's the use of them?
Well, if you do the appropriate null reference check (I mean just an ordinary null reference check using traditional C# syntax), the compiler will detect that the reference can be safely used, and the nullable reference will then behave (within the scope of the check) as if it were a mandatory reference.
In the example below the compiler detects the null check and this affects the way that the nullable reference can be used within the 'if' block and 'else' block:
The compiler will also recognise this sort of null check:
And this:
The compiler will also recognise when you do the null check using other language features:
Hopefully it is now clear that if the new style references are used throughout the code, null reference exceptions are actually impossible. However once the effort has been made to convert the code to the new style references, it is important to guard against the accidental use of general references, as this compromises null safety. There needs to be an attribute such as this to tell the compiler to prevent the use use of general references:
This attribute could also be applied at the class level, so you could for example forbid general references for the assembly but then allow them for a class (if the class has not yet been converted to use the new style references):
(See my additional post for more details.)
7. Can we develop a reasonable list of null check patterns that the compiler can recognise?
I have not listed every possible way that a developer could do a null check; there are any number of complex and obscure ways of doing it. The compiler can't be expected to handle cases like this:
However the fact that the compiler will not handle every case is a feature, not a bug. We don't want the compiler to detect every obscure type of null check construct. We want it to detect a finite list of null checking patterns that reflect clear coding practices and appropriate use of the C# language. If the programmer steps outside this list, it will be very clear to them because the compiler will not let them dereference their nullable references, and the compiler will in effect be telling them to express their intention more simply and clearly in their code.
So is it possible to develop a reasonable list of null checking constructs that the compiler can enforce? Characteristics of such a list would be:
I think the list of null check patterns in the previous section, combined with some variations that I am going to put in a more advanced post, is an appropriate and intuitive list. But I am interested to hear what others have to say.
Am I expecting compiler writers to perform impossible magic here? I hope not - I think that the patterns here are reasonably clear, and the logic is hopefully of the same order of difficulty as the logic in existing compiler warnings and in code checking tools such as ReSharper.
8. Converting Between Mandatory, Nullable and General References
The principles presented so far lead on to rules about conversions between the three types of references. You don't have to take in every detail of this section to get the general idea of what I'm saying - just skim over it if you want.
Let's define some references to use in the examples that follow.
Firstly, any reference can be assigned to another reference if it is the same type of reference:
Here are all the other possible conversions. Note that when I talk about 'intent' I am meaning the idea that a traditional (general) reference is conceptually either mandatory or nullable at any given point in the code. This intent is explicit and self-documenting in the new style references, but it still exists implicitly in general references (e.g. "I know this reference can't be null because I wrote a null check", or "I know that this reference can't or at least shouldn't be null from my knowledge of the business domain").
There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).
Some of the conversions that were not possible by direct assignment can be achieved slightly less directly using existing language features:
9. Class Libraries
As mentioned previously, the compiled IL code will be the same whether you use the new style references or not. If you compile an assembly, the resulting binary will not know what type of references were used in its source code.
This is fine for executables, but in the case of a class library, where the goal is obviously re-use, the compiler will need a way of knowing the types of references used in the public method and public property signatures of the library.
I don't know much about the internal structure of DLLs, but maybe there could be some metadata embedded in the class library which provides this information.
Or even better, maybe reflection could be used - an enum property indicating the type of reference could be added to the ParameterInfo class. Note that the reflection would be used by the compiler to get the information it needs to do its checks - there would be no reflection imposed at runtime. At runtime everything would be exactly the same as if traditional (general) references were used.
Now say we have an assembly that has not yet been converted to use the new style references, but which needs to use a library that does use the new style references. There needs to be a way of turning off the mechanism described above so that the library appears as a traditional library with only general references. This could be achieved with an attribute like this:
Perhaps this attribute could also be applied at a class level. The class could remain completely unchanged except for the addition of the attribute, but still be able to make use of a library which uses the new style references.
(See my additional post for more details.)
10. Constructors
Eric Lippert's post (see reference in the introduction to this post) also raises thorny issues about constructors. Eric points out that "the type system absolutely guarantees that ...[class] fields always contain a valid string reference or null".
A simple (but compromised) way of addressing this may be for mandatory references to behave like nullable references within the scope of a constructor. It is the programmer's responsibility to ensure safety within the constructor, as has always been the case. This is a significant compromise but may be worth it if the thorny constructor issues would otherwise kill off the idea of the new style references altogether.
It could be argued that there is a similar compromise for readonly fields which can be set multiple times in a constructor.
A better option would be to prevent any access to the mandatory field (and to the 'this' reference, which can be used to access it) until the field is initialised:
Note that it is not an issue if this forces adjustment of existing code - the programmer has chosen to introduce the new style references and thus will inevitably be adjusting the code in various ways as described earlier in this post.
And what if the programmer initializes the property in some way that still makes everything safe but is a bit more obscure and thus more difficult for the compiler to recognise? Well, the general philosophy of this entire proposal is that the compiler recognises a finite list of sensible constructs, and if you step outside of these you will get a compiler error and you will have to make your code simpler and clearer.
11. Generics
Using mandatory and nullable references in generics seems to be generally ok if we are prepared to have a class constraint on the generic class:
However there is more to think about generics - see comments below.
12. Var
This is the way that I think var would work:
The first case in each group would be clearer if we had a suffix to indicate a general reference (say #), rather than having no suffix due to the need for backwards compatibility. This would make it clear that 'var#' would be a general reference whereas 'var' can be mandatory, nullable or general depending on the context.
12. More Cases
In the process of thinking through this idea as thoroughly as possible, I have come up with some other cases that are mostly variations on what is presented above, and which would just have cluttered up this post if I had put them all in. I'll put these in a separate post in case anyone is keen enough to read them.