dotnet / csharplang

The official repo for the design of the C# programming language
11.53k stars 1.03k forks source link

Champion "Nullable reference types" (16.3, Core 3) #36

Open MadsTorgersen opened 7 years ago

MadsTorgersen commented 7 years ago

Note this includes the object constraint: where T : object.

LDM:

gulshan commented 7 years ago

Will default constructor also generate warnings for all non-initialized reference type members?

iam3yal commented 7 years ago

In the following case:

var y = b ? "Hello" : null; // string? or error
var z = b ? 7 : null; // Error today, could be int?

I think that in both cases it should be <type>?; more and more people are using var in their codebase and switching to use a type just to evaluate an expression seems odd, however, in the case you really want it to be an error, using a type actually make more sense.

yaakov-h commented 7 years ago

Nullability adornments should be represented in metadata as attributes. This means that downlevel compilers will ignore them.

Can we have a serious discussion about using modopts (or even modreqs) to allow for nullability overloads, i.e. two methods who only differ by nullability?

This would be useful for functions where null input = null output, and non-null input = non-null output, or even vice-versa.

HaloFour commented 7 years ago

@yaakov-h

I believe that conversation has happened a couple of times already. modreq is not-CLS. modopt does allow for overloads but requires specific understanding on the part of all consuming compilers since the modifier must be at the very least copied into the call signature. Both break compatibility with existing method signatures. For something that will hopefully proliferate across the entire BCL very quickly using modopt would create a big hurdle to doing that.

gulshan commented 7 years ago

I think this feature is about objects and not types as it does not change anything about the actual type system. When nullability was introduced for value types, it was about the types. But that's not the case here. Quoting the proposal (emphasis mine)-

This feature is intended to:

  • Allow developers to express whether a variable, parameter or result of a reference type is intended to be null or not.
  • Provide optional warnings when such variables, parameters and results are not used according to their intent.

I think the "variables, parameters and results" (are the members of a class being excluded here?) can be easily replaced by "objects". The proposal is introducing non-nullable objects(even including value types) and putting them to be default with separate notation for nullable objects. So, the proposal can be renamed to-
"Default non-nullable objects" or something like that IMHO.

gafter commented 7 years ago

@gulshan Objects exist at runtime. Types exist at compile-time. This is about changing what happens at compile-time, not at runtime.

gulshan commented 7 years ago

@gafter I have two questions-

For nullable value types, both the answers are "no". But, I guess for nullable references, the answer is "yes". Because there is nothing changed about types. Only thing changed is, whether a reference should hold null or not. If I am wrong here, please correct me.

BTW, now I propose the title "Default to non-nullable references" or simply "Non-nullable references".

gafter commented 7 years ago

@gulshan Yes, and yes. The first because System.Type objects only exist at runtime. The second because the compiler knows that default(SomeReferenceType) is the constant null.

Thaina commented 7 years ago

Forgot to ask that. Is this feature would allow generic nullable type?

I mean code like this should be able to compile with this feature enabled?

public T? GET<T>(string obj)
{
    try { return HttpGet(url).Convert<T>(); }
    catch { return null; }
}

int n = GET<int>(url0) ?? -1;
var m = GET<MyCustomClass>(url1);
gafter commented 7 years ago

@Thaina Not as currently envisioned, no. There is no way to translate that into something expressible in the CLR (except by duplicating the method for each "nullable" unconstrained type parameter).

orthoxerox commented 7 years ago

@gafter how would duplicating work if C# can't support methods that have mutually exclusive constraint flags (gpReferenceTypeConstraint and gpNotNullableValueTypeConstraint) but have otherwise identical signatures from the viewpoint of overload resolution?

yaakov-h commented 7 years ago

The proposal doesn't take into considerations any of the discussion from the roslyn repo about telling the compiler "I know at this point in time that the value is not null. Do not warn me.", for example using a ! postfix operator.

Can we take that into consideration, or will #pragma warning disable be the only way around false positives?

Richiban commented 7 years ago

@gafter In @Thaina 's example, would it be appropriate to lift the return into a Nullable type at that point? I'm not sure of the feasibility of a compiler feature that sometimes works with Nullable<T> and sometimes works with plain T and hides the difference between the two to the compiler.

I was thinking that the original example:

    public T? GET<T>(string obj)
    {
        try { return HttpGet(url).Convert<T>(); }
        catch { return null; }
    }

would like this if you decompiled it:

    public Nullable<T> GET<T>(string obj)
    {
        try { return HttpGet(url).Convert<T>(); }
        catch { return null; }
    }

But in C# vNext you wouldn't know. The type just appears to be T? still.

MikeyBurkman commented 7 years ago

I'm still a bit curious how this could possibly be implemented with var without possibly breaking a lot of existing code:

var s = "abc";
...
s = null;

This is perfectly valid currently. However, what type should s be inferred to? If it's inferred to just String, then it'll break existing valid code. If it's inferred to String?, then it'll essentially force devs to make a choice between type inference and strict nullability checks.

Pzixel commented 7 years ago

@MikeyBurkman it's may be inferred in different ways based on project checkbox (like it's done for unsafe)

mlidbom commented 7 years ago

To my mind the most important issue, by far, to fix is the current need to manually validate that every single reference type parameter is not null at runtime. This latest proposal seems to be de-emphasize that to the point where it is just mentioned as a possible "tweak" at the end. The way I understand it you would get warnings but zero guarantees.

I believe I am far from alone in feeling that without fixing that this feature would be of limited value. Here is a vote for ensuring that runtime not-null guarantees are prioritized. It would probably remove something like 90% of the current parameter validation code that that C# developers write.

I would suggest going a step further than the "!" syntax and supplying a compiler switch that would automatically generate such checks for ALL reference type parameters not explicitly allowed to be null by marking them with "?". This would be a huge time saver and if implemented in the compiler it should be possible to implement optimizations that would make the cost negligible.

I would also suggest extending the "!" syntax to check for default(TValue) in struct parameters. So that you could, as one example, validate that a Guid is not Guid.Empty with just a single "!" character.

mattwar commented 7 years ago

@MikeyBurkman the var can infer the type to be string? with a known non-null value. This means the variable will pass all the tests and be able to be used as not-null until it is changed to the null value or a value not known to be not null.

mattwar commented 7 years ago

@Richiban you could only write that if T was either constrained to class or struct.

MikeyBurkman commented 7 years ago

@mattwar The issue with inferring it to the nullable value isn't that it's necessarily wrong, it's that it's extremely inconvenient, to the point that it becomes a nuisance. I don't recall seeing any suggestion about flow typing (where a variable's type can be narrowed by, for instance, a non-null assertion), so in order to pass that variable to a function not expecting a null, I'd either have to assign it some default value (foo(s || "")) which would be quite redundant in most cases, or I'd have to just assign it the non-null type explicitly, making var useless.

HaloFour commented 7 years ago

@MikeyBurkman

There have been multiple conversations on the Roslyn forums about this subject. The "nullability" isn't part of the type of the local, it's how it's used. So the following would produce zero warnings:

string? s = "1234";
int len = s.Length; // the compiler knows that s is not null here

So in most circumstances you wouldn't be forced to do anything special with the variable in order to placate the compiler, as long as the variable was definitely assigned to something that wasn't null.

Pzixel commented 7 years ago

@mlidbom I'm against the ! syntax becuase it provides a lot of mess. Syntax should be consistent thus we need ? for nullables and just a type for common variables, but not ?/type for value types and type/! for reference ones. It's better to have a little breaking change than open this bottle with djinns. All constants must be of not-null types (it's obvious, because otherwise you just cannot have non-null constants), thus var x = some_constant should be of non-null type even if in previous language version it is inferred as nullable.

Another question if non-null type will be a csc part without CLR support or we will have it? I mean readonly is a csc hack and reflection allows to modify readonly field. Are we going to do the same with non-null references when you cannot pass nulls in C# but fully able to do it from reflection?

orthoxerox commented 7 years ago

Another question if non-null type will be a csc part without CLR support or we will have it? I mean readonly is a csc hack and reflection allows to modify readonly field. Are we going to do the same with non-null references when you cannot pass nulls in C# but fully able to do it from reflection?

Yes, like readonly.

jnm2 commented 7 years ago

@Pzixel

I mean readonly is a csc hack

Not exactly; it translates directly to the CLR's initonly and is taken very seriously in IL verification. Reflection is the only hack here. But it's a reasonable hack. I mean, on a purely physical level, the memory is mutable if you can get a pointer to it. People would be able to mutate it anyway, so disallowing reflection to mutate initonly memory isn't really protection.

mlidbom commented 7 years ago

@Pzixel I don't really care about the syntax as long as I can get a guarantee that at runtime non-null references are not null. Regardless of which language calls the method, or if it is called through reflection I need to know that it will be non-null or an exception will be thrown. That guarantee not being present is the "billion dollar mistake" in my view and any "fix" that does not adress that is fatally flawed in my opinion.

Pzixel commented 7 years ago

@mlidbom but I care about both. Of course, it's better to have non-null types with ugly syntax than nothing but it's not a question. And as you can see, people think that it should behave like readonly so you can easily pass null through reflection. And I really don't understand how syntax question makes it "fix" that does not adress that. I said that I dislike !, not the whole idea.

iam3yal commented 7 years ago

Just to add to @jnm2 by default reflection requires full-trust, specifically, when accessing non-public members.

mattwar commented 7 years ago

@mlidbom unfortunately, so far all proposals that attempt to solve the null problem by making guarantees have been the ones to be too flawed to pursue. That's why we have this proposal. It doesn't make a guarantee, but it does help you find your bugs by letting you declare your intent.

mlidbom commented 7 years ago

@mattwar Ah, that is unfortunate indeed. I'm afraid I missed the information about those flaws. Any links to information would be greatly appreciated.

@Pzixel In no way were any of my comments meant as criticism of your thoughts on any of this. From what I can tell we are in pretty much 100% agreement :)

shmuelie commented 7 years ago

Figured I throw my 2 cents in:

As the proposal stands it's more like an analyzer than a compiler feature to me. Since in the end it would have to be stored as attributes anyways why not sure them to start? While easy to use is great, easy to use and easy to misunderstand what you get is bad. Using attributes makes it clearer that there is no true guarantee here.

HaloFour commented 7 years ago

@SamuelEnglard

Probably because attributes are very limited as to where they can be applied. That and the amount of flow analysis would be quite a bit beyond what a bog-standard analyzer could be expected to perform.

Yes, it ain't perfect. Yes, you can defeat it by trying hard enough. But that really shouldn't matter. It should still catch the majority of cases, particularly the accidental cases.

I look at it like Java generics. They aren't perfect and it's not hard to intentionally trip up the compiler. But they're still immensely helpful and do what they're expected to do the vast majority of the time, particularly when the developer isn't trying to be "clever".

shmuelie commented 7 years ago

@HaloFour I do see that. I'd rather have this than not 😄

mattwar commented 7 years ago

I have a prototype of an analyzer that uses attributes here: https://github.com/mattwar/nullaby

This proposal is superior to attributes because the null annotations can be encoded in and flow through the type system (in and out of generic type parameters).

In an attribute based system I can declare parameters and return types to have non-null or null intent easily enough, but I can't easily describe a parameter or return type having the type List<string?>.

Pzixel commented 7 years ago

Yes, i'd defenitly like it to be a part of type system because otherwise we have no way to express List of non-null references. It complicates everything, but makes this feature much more powerful.

oxfordemma commented 7 years ago

I ended up taking a stab at this as well with attributes. https://github.com/FUR10N/NullContracts

This analyzer is definitely trick-able, but I think I'd prefer it that way. I went super strict with my first attempt, but it ended up being too frustrating to use.

A common scenario for me was checking for null on a field that could be null, and then using that value in some method that required a non-null parameter. If you go really strict, then the analysis probably has to throw away that null-check info since some other method call could change the value after your code checked it (example). This resulted in far too many false-positive errors to actually be useful, so I had to relax it a bit.

I also hit a wall with the whole List of non-null references idea, but that was one of the assumptions I ended up giving it - iterators and list indexers are assumed not-null.

It's not perfect, but my team's pretty happy with the issues it's caught so far.

jeffanders commented 7 years ago

The current proposal only briefly touches on generics and has some open questions. I have added a proposal the attempts to go into detail regarding how non-nullable reference types would interact with generics and generic type parameters. See #403

rorymurphy commented 7 years ago

I tend to see this as something better handled as an analyzer function better handled with attributes, especially if there are edge cases it does not fully cover.

The comparison was made favorably to Java generics, but those are somewhat acknowledged to be a syntactical sugar kludge (and have never really worked as well as they should). Hoping .NET does not head down a similar path with this.

Would like to see something that provides a guarantee and also solves the runtime aspect. I see that the current proposal nullability being viewed as a usage restriction and therefore does not affect the underlying type of the variable. However, it seems like the issue could be handled by wrapping the reference type in a NonNullable<T> struct. I took a shot in a gist, including implicit conversion back to T. And certainly, it could be assigned a shorthand operator similar to T? for Nullable<T>.

Seems like that would allow the guarantee and address the runtime issues @mlidbom raised. An analyzer could key off references to the struct just as easily as the proposed new syntax for compile-time support. Trying to understand, given that the new syntax does not provide any runtime guarantee, what's the major advantage over a wrapper based approach (similar in spirit to Nullable<T>) that includes a runtime guarantee?

Pzixel commented 7 years ago

@rorymurphy I had proposed NotNullable<T> struct solution some time ago but then we found some drawbacks. You can find it in linked topics. However with CLR and language support they can be workarounded, I guess.

shmuelie commented 7 years ago

@rorymurphy I've spent days trying to get a NotNullable<T> to work. It works in very limited cases making it a bad fit for the language or framework. For an individual development team maybe...

rorymurphy commented 7 years ago

@SamuelEnglard - can you provide any specific challenges you had with it? Tough to respond without any specifics. Offhand, I don't see how the approach is any more limited than Nullable<T>.

shmuelie commented 7 years ago

@rorymurphy the primary issue is that it only works with concrete types. T cannot be interfaces or abstract classes without it starting to crack

Richiban commented 7 years ago

@rorymurphy @SamuelEnglard The big problem for me (as someone who also, some time ago, tried to write a NotNull type) is that you can't stop someone from writing default(NotNull<string>) or new NotNull<string>(). Which results in a NotNull instance that contains null. Due to the very, very deeply ingrained concept in the CLR that every type has a default value I'm not sure there's much you can do about it.

yaakov-h commented 7 years ago

Also, FormatterServices.GetUninitializedObject(typeof(NotNull<string>)).

rorymurphy commented 7 years ago

@Richiban - good catch, hadn't remembered how deeply embedded in the CLR value type initialization to 0 was embedded until I looked it up. Although since all these proposals are talking about compiler changes, couldn't this be overcome by a compile-time check that NonNullable is not used as a type parameter to any class/method that does a default initialization (either as you described, or simply by declaring a variable without explicitly initializing it)? Granted, that still leaves reflection as a gap to be solved.

Richiban commented 7 years ago

I'm sure this is exactly the sort of issue that the C# compiler team is dealing with right now... I'm sure that what they've come up with is some form of advanced null-tracking combined with additions to the type system, rather than any fundamental changes to the CLR re: nullability.

What they should be able to offer you is that any method written in C# vNext (let's say that's the version of the language that supports non-nullable reference types) will, if written in the correct manner, get decorated with a compiler attribute that the return result is guaranteed non-null (obviously not the case if anyone has used default(T)).

I imagine it will also be necessary to have a where T : null or where T : default generic constraint available for those occasions when you do want to be able to work with default values.

benjamin-hodgson commented 7 years ago

I have a question regarding generics, which I haven't seen (or at least couldn't find) addressed explicitly in this proposal or in @jeffanders's discussion in #403.

class Foo<T>
{
    private T? _value;
    public T Value
    {
        get
        {
            if (!ValueWasSet)
            {
                throw new InvalidOperationException();
            }
            return _value!;
        }
        set
        {
            ValueWasSet = true;
            Value = value;
        }
    }
    public bool ValueWasSet { get; private set; } = false;
}

In this example I've typed _value as a nullable T, because in between creating a Foo and setting its Value the _value field will be default(T). In order not to get a warning about _value being uninitialised I have to give it a nullable type. Of course there's no way to observe a null Value from outside this class. (I can see this coming up in classes like List<T>, which maintains a partially-initialised array of values but maintains an invariant that the uninitialised portion is inaccessible from outside.)

But this raises the possibility of some sort of doubly-nullable type. If I instantiate T to be a nullable reference type,

var foo = new Foo<string?>();

then what is the type of foo._value? Is it...

  1. string??, allowing the ?s to build up? What does a type like string?? mean, exactly? Can I construct types with hundreds of ?s?
  2. string?, collapsing all the ?s down into one? Then the ! in Value's getter would convert from string? to string?, which I find strange. It also doesn't obey the usual rules of type expressions, namely that the type checker doesn't perform any sort of evaluation or reduction of type expressions. (T[][] is not the same as T[]!)
  3. Or would the language require some sort of where T is not nullable constraint on the generic parameter? That would restrict the usefulness of the class. (It clearly wouldn't be acceptable for List<T> to reject nullable Ts.)

More concerning is the question of representation. If I instantiate T to a value type,

var foo = new Foo<int>();

then how would _value be typed?

  1. Nullable<int>? I find this strange: when T is a reference type then foo._value is represented in memory in the same way as T (a possibly null pointer) but when T is a value type then foo._value is represented differently than T (a struct containing a T and a bool). This would presumably require a change to the JIT, or a crazy system wherein the C# compiler generates two CLR classes for every C# class.
    1. int? I find this strange as well: when instantiating T to a non-nullable reference type then _value is typed as a nullable T, but when it's a non-nullable value type then it's a non-nullable T.
    2. Some sort of crazy system wherein it's typed as an int? for the purposes of Intellisense and type-checking but it's represented as an int?

Similar (although even more mind-bending) concerns apply when T is a nullable value type,

var foo = new Foo<int?>();

or itself a nullable generic type,

class Bar<T>
{
    private Foo<T?> _foo;
}

and so on...

Anyway the point is that my original Foo seems like perfectly reasonable code but could quite easily lead to surprising behaviour. Keen to hear feedback on my concerns!

Richiban commented 7 years ago

@benjamin-hodgson In your first example I would expect (2): the ?s collapse.

If we take the view that, since the value of T is not actually wrapped in any way to make the type T? then we can view T? as a type union with T and null (other languages represent this as T | null, so that's what I'll use from now on).

In these cases the types follow the rules of set unions, and T | null | null is equivalent to T | null.

This works in your example because (I assume) T is assignable to T?, so the line return _value!; is a T? that is asserted to be a T and the result is then assigned to a T?.

mattwar commented 7 years ago

Because the issues related to having actual guaranteed non-nullable types in C# historically have scuttled any attempt to add such a feature ever, we've opted to take a different approach with this proposal. This proposal is not about making guarantees, its just about allowing you to declare your intent so the compiler can find most of your potential bugs. TypeScript has been doing this for a few years now.

MikeyBurkman commented 7 years ago

@mattwar The TypeScript compiler pretty much guarantees (assuming TS 2.x and you have the compiler check on) that anything that is not a Null type is in fact not null. In TS, Null is its own type, just like string and number are.

The only time the TS compiler is not going to be correct in this is in interop with JS and with libraries that were written without the compiler flag. To say that the TS compiler is not making null-safe guarantees is no different than saying that it doesn't guarantee that your string object is not actually a string at runtime. The compiler does I think the best job imaginable for nullability, if your codebase is all strictly TS 2.x code.

shmuelie commented 7 years ago

@MikeyBurkman I think what @mattwar is getting at though is that the runtime for TypeScript has support for null and nothing is stopping a user of your code or something at runtime from creating a null value. Theoretically this ability in C# should also would if ALL your code is C# vNext.

Pzixel commented 7 years ago

@SamuelEnglard Theoretically you have an ability to modify readonly field in C#, but it's not a concern. We always have some tradeoffs between security and possibility to implement. If compiler guarantees that there is no way to pass a null value in not-null function without reflection and using C# vNext it still be very useful. And don't forget, nowadays we have nothing to help us with nulls, and any help would be nice.