dotnet / csharplang

The official repo for the design of the C# programming language
11.48k stars 1.02k forks source link

Indicate that a non-nullable Field must be initialized where the type is constructed #2328

Closed YairHalberstadt closed 4 years ago

YairHalberstadt commented 5 years ago

Problem

Consider the following three outstanding issues with NRTs

1. Object Initializers

Object initializers are greatly loved by many C# programmers. However they don't interact very well with NRTs.

Consider this for example:

public class C
{
    public string Str {get; set;}
}
...
var c = new C { Str = "Hello World" };

If Str is to be non-nullable, then when enabling nullability we are forced to completely change all this code:

public class C
{
    public C(string str) => Str = str;
    public string Str {get;}
}
...
var c = new C("Hello World");

Making object initializers usable only be nullable properties and fields.

2. DTOs

Many DTOs are initialized by ORM/Serialization frameworks. As a result they often contain only property getters and setters, without any constructors.

This of course causes warnings when NRTs are enabled, as it doesn't know the properties are always initialized by the ORM/Serialization framework.

3. Structs

Structs always have a default constructor, which cannot be overridden.

As a result whenever a struct containing a field of a non-nullable reference type is constructed using the default constructor, the field will initially be null. This leaves a big hole in the nullable-type-system:

public struct S
{
    public string Str { get; set; } //no warning
}
...
var s = new S(); //no warning
var l = s.Str.Length; // kabam! NullReferenceException! - but no warning

Solution

Based on a suggestion by @Joe4evr https://github.com/dotnet/csharplang/issues/36#issuecomment-471470105

The real issue here is that present mechanisms only allow us to guarantee a field is initialized inside the constructor, but currently many types rely on their invariants being upheld by whoever constructs the type.

It should be possible to mark that a field/auto-property must be initialized at the point it is constructed, before it is used. This could be done via an attribute for example:

public class C
{
    [MustInitialise]
    public string Str {get; set;}
}
...
public struct S
{
    [MustInitialise]
    public string Str { get; set; }
}

Then when an instance of the type is constructed, it should be a warning to use the type before all such fields are initialised:

public class C
{
    [MustInitialise]
    public string Str {get; set;}
}

...

var c = new C { Str = "Hello World" };
var l = c.Str.Length; //no warning
c = new C();
var x = c.Str; //warning
c.Str = "Hello"
var l = c.Str.Length;  //no warning

...

public struct S
{
    [MustInitialise]
    public string Str { get; set; }
}

...

var s = new S { Str = "Hello World" };
var l = s.Str.Length; //no warning
s = new S();
var x = s.Str; //warning
s.Str = "Hello"
var l = s.Str.Length;  //no warning

All structs should have this attribute on all their fields implicitly.

Further issues

Consider this struct:

public struct S
{
    public S(string str) => Str =str;

    public string Str { get; set; }
}

Then it would be a warning not to initialize S.Str when the the new S(string) constructor was called, even though S.Str was initialized by the constructor!

I think the solution to this would be for the compiler to detect which constructors initialize a field, and implicitly add parameters to the MustInitialize attribute which indicate which constructors did not initialize the struct.

ie. the compiler should generate code equivalent to the following:

public struct S
{
    public S(string str) => Str =str;

    [MustInitialize("DefaultConstructor")]
    public string Str { get; set; }
}

then, when the default constructor is called, the compiler knows it must make sure S.Str is initialized.

333fred commented 4 years ago

No, init is not intended to solve this issue. Just because something can be set during initialization does not mean it must be set during initialization.

Richiban commented 4 years ago

@333fred

No, init is not intended to solve this issue. Just because something can be set during initialization does not mean it must be set during initialization.

I'm very surprised to hear that... Init means that the member can only be set during initialization, so given this class:

public class Point
{
    public  int X { get; init ; }
    public  int Y { get; init ; }
}

Surely, surely, this is not allowed:

var x = new Point { X = 4 };
andre-ss6 commented 4 years ago

@333fred

No, init is not intended to solve this issue. Just because something can be set during initialization does not mean it must be set during initialization.

I'm very surprised to hear that... Init means that the member can only be set during initialization, so given this class:

public class Point
{
    public  int X { get; init ; }
    public  int Y { get; init ; }
}

Surely, surely, this is not allowed:

var x = new Point { X = 4 };

Why not? int has a default value of 0. Although for NRTs, yes, it would make less sense.

Richiban commented 4 years ago

I thought we were trying to get away from this default value nonsense. The whole point of the records feature is to make working with immutable data types easier, but if that means that when initialising you're allowed to just leave off some of the values then it's a pretty undercut feature.

HaloFour commented 4 years ago

Even if it's not required to specify those properties it sounds like that would be a good place for NRTs to warn if the non-nullable properties are not initialized.

// no NRT warnings
public class Person {
    public string Name { get; init; }
    public string? Title { get; init; }
}
// no NRT warning
var person1 = new Person { Name = "Someone" };

// NRT warning here
var person2 = new Person { Title = "Something" };
333fred commented 4 years ago

Even if it's not required to specify those properties it sounds like that would be a good place for NRTs to warn if the non-nullable properties are not initialized.

// no NRT warnings
public class Person {
    public string Name { get; init; }
    public string? Title { get; init; }
}
// no NRT warning
var person1 = new Person { Name = "Someone" };

// NRT warning here
var person2 = new Person { Title = "Something" };

I was originally of the opinion we could do this. However, when you break it apart, these really are orthogonal features. Just because something has a default value of null doesn't mean that it's not required to be set, and just because null is not allowed for a property that doesn't suddenly mean that the property is required to be set. It could have some default value provided by other other initialization steps if no value was provided. Positional argument lists have a great way of doing this today, for example:

void M(string? requiredButNullable, string notRequiredButNotNullable = "") { ... }

The init keyword allows developers to be more expressive about the contexts you can provide a value in, but it does nothing to help the issue of required properties.

333fred commented 4 years ago

@333fred

No, init is not intended to solve this issue. Just because something can be set during initialization does not mean it must be set during initialization.

I'm very surprised to hear that... Init means that the member can only be set during initialization, so given this class:

public class Point
{
    public  int X { get; init ; }
    public  int Y { get; init ; }
}

Surely, surely, this is not allowed:

var x = new Point { X = 4 };

It's absolutely allowed. For another example, take this person class:

public class Person
{
    public string FirstName { get; init; }
    public string MiddleName { get; init; } = "";
    public string LastName { get; init; }
}

From a public consumption perspective, nothing about that = ""; is exposed. Therefore, the consumer would be required to set the MiddleName. Many people don't have middle names, so it really would be represented by an optional parameter. A constructor signature for this type might look like this:

public Person(string firstName, string lastName, string middleName = "") { ... }

Now, we certainly could make init also mean required, as not providing a default value in a positional parameter does. This doesn't really solve the issue, though. Suddenly, that means you can't have optional, immutable properties that have an initial value. You also can't have required mutable properties. We really need a good solution for this meaning of required. While an attribute as originally proposed by @YairHalberstadt would work, it doesn't feel particularly first-class to me. I don't have any good syntax proposals for it at this point, but I do feel like we need some kind of explicit syntax for it.

markm77 commented 4 years ago

Aren't we in a situation now where we are moving towards use of initializers rather than constructors for initialization due to superior syntax/clarity except for the fact we have no way yet to require properties be initialized in the object initializer.

Non-nullable reference types present a particular problem with "late" initialization due to no default value but don't we in fact need to be able to specify "initializer-required" properties of any type? Value types can be initializer-required too; the default value might be meaningless.

I think a general way to specify initializer-required properties would be the best, e.g. something like this.

yaakov-h commented 4 years ago

That seems like something that could be done with an attribute and an analyzer, at least initially.

Richiban commented 4 years ago

Missing off a property that doesn't have an initial value provided by the implementation is an obviously invalid scenario. Since, as 333fred says, the = ""; is not exposed to the outside then I would have the compiler emit some kind of 'optional' marker / metadata on those properties that have been given default values.

I would say that allowing any property to simply not be set is a really dangerous idea. These init properties are supposed to be able to replace constructors; it would be ridiculous if this was allowed:

public class Person
{
    public Person(string firstName, string surname)
    {
    //...
    }
}

var p = new Person("Jo"); // I know you've specified two parameters, but, lol
333fred commented 4 years ago

Missing off a property that doesn't have an initial value provided by the implementation is an obviously invalid scenario. Since, as 333fred says, the = ""; is not exposed to the outside then I would have the compiler emit some kind of 'optional' marker / metadata on those properties that have been given default values.

I would say that allowing any property to simply not be set is a really dangerous idea. These init properties are supposed to be able to replace constructors; it would be ridiculous if this was allowed:

public class Person
{
    public Person(string firstName, string surname)
    {
    //...
    }
}

var p = new Person("Jo"); // I know you've specified two parameters, but, lol

To be clear, this example is the situation today. One of the reasons we're adding init is because many people can and do use mutable properties for this. However, we're not going to specify that this one keyword means "you can only set this during init and you're required to set it" because we really do want that latter ability to be applicable to both set of today and init. If we make init both, that just means that we'll introduce another inconsistency that people have to learn.

pablocar80 commented 4 years ago

Could we have a required init or similar?

333fred commented 4 years ago

Could we have a required init or similar?

That's certainly one possibility. One of the big annoyances with that syntax, to me, is how long required is. It's 8 characters! 3 syllables! It's a big penalty to type imo.

pablocar80 commented 4 years ago

‘required init’ may be verbose, though still definitely shorter and more readable than any alternative boilerplate. It’s similar in length to protected or private and those are quick to type with intellisense in visual studio.

markm77 commented 4 years ago

One of the big annoyances with that syntax, to me, is how long required is. It's 8 characters! 3 syllables! It's a big penalty to type imo.

Perhaps req (for "required init") and req set (for "required set")?

333fred commented 4 years ago

One of the big annoyances with that syntax, to me, is how long required is. It's 8 characters! 3 syllables! It's a big penalty to type imo.

Perhaps req (for "required init") and req set (for "required set")?

I've thought about that as well, and it's currently one of the leading winners in my mind. We have precedence for short abbreviations like var in the language, and it has an advantage that unlike var, I can't think of another English word it could mean in context. var could be variable or value.

YairHalberstadt commented 4 years ago

var could be variable or value.

You have an interesting spelling of value :-)

333fred commented 4 years ago

var could be variable or value.

You have an interesting spelling of value :-)

I'm still waking up 😅

pablocar80 commented 4 years ago

Please add this req init or similar to C# 9.0, that would really tie things together for using nullability.

canton7 commented 4 years ago

Reading through the thread, I'm a little confused: it seems there are two separate issues which keep being conflated then teased back apart.

From the OP, there is an issue issue specifically about the fact that you get nullable warnings with this code:

public class C
{
    public string Str {get; set;}
}
...
var c = new C { Str = "Hello World" };

It seems that HaloFour's suggestion provides a nice solution to this specific problem: by marking a property as init, you're telling the compiler that you're expecting (but not requiring) that it's set during initialization.

I think it's fairly intuitive that public string Str { get; init; } shouldn't warn, but creating a new C without setting Str in the object initializer would cause a warning.

Note that this warning would be specifically because Str could be null. It's not directly to do with the fact that it hasn't been set. A public string? Str { get; init; } property wouldn't cause such a warning.

This solves the pain of warnings when declaring properties which are intended to be set during initialization, or by a serialization framework. It does so without needing any new syntax, and in a way which is consistent with how I'd expect warnings to work with init anyway.

(There is a bit of a hole here: there's a case where you might expect a non-nullable property to be set at some point after the object is constructed but before it's "used". It's a small hole and it has a workaround in { get; set; } = null!).


There's a separate issue about having properties which are required to be set during initialialization, which would have an impact on but be relatively independent from nullable warnings. This feels like it has overlaps with records.


Of course, it's entirely possible that I've missed the point...

HaloFour commented 4 years ago

@canton7

Seems the reasoning is that init by itself doesn't indicate that the property must be initialized, and that the consumer is unaware of the possibility that the property has a default value:

public class C
{
    public string Str {get; init;} = "";
}

In this case it would be inappropriate to warn on not setting Str in an initializer. While this is true it feels that there are a number of ways that this could be solved right now, even without tying the feature to a potential future "required init" or "validators".

It is mentioned that NRTs and init-only properties are orthogonal, with which I disagree. NRTs are cross-cutting and already affect virtually every other feature in the language. It is also mentioned that trying to solve this via an analyzer would make it not a first class citizen, which is weird considering that NRT analysis is already driven by a huge number of attributes. In the meantime this leaves a massive gap in NRT analysis.

markm77 commented 4 years ago

@canton7 I understand your point but I also think it's worth considering things from the PoV of the consumer of a class. Don't you think it would be pretty unfriendly to receive nullable warnings relating to a class when trying to initialise it? Rather than more straightforward errors relating to required properties, i.e. clear violation of code contract?

req / req set also solves a similar problem in other cases, e.g. for uses of bools and ints where the default value has no meaning and should not be used but I accept that is not why this thread was started.

HaloFour commented 4 years ago

@markm77

Don't you think it would be pretty unfriendly to receive nullable warnings relating to a class when trying to initialise it?

No more than receiving nullable warnings when trying to call a method on that same class. You'll already get nullable warnings if you try to set one of those properties to null explicitly, allowing them to remain null is just as incorrect and as much a vector for bugs.

bbarry commented 4 years ago

required ...

you could always hang on a bang:

public string Str {get; init!;}
andre-ss6 commented 4 years ago

Seems the reasoning is that init by itself doesn't indicate that the property must be initialized, and that the consumer is unaware of the possibility that the property has a default value

It's also been mentioned that this would make init even more of a replacement for set. The solution to this issue should be one that is applicable for both set and init.

canton7 commented 4 years ago

@markm77

I understand your point but I also think it's worth considering things from the PoV of the consumer of a class. Don't you think it would be pretty unfriendly to receive nullable warnings relating to a class when trying to initialise it?

No? If a property is init, non-nullable, and doesn't have a default value, it's obviously intended that it should be set from the initializer. For consistency with a corresponding set property, you should get a nullable warning somewhere. Your options are:

  1. Warn on the property declaration (as with set)
  2. Warn on initialization

I think option 1 is always going to annoy. If someone has made a non-nullable init property without a default value, they obviously intend that it's set on construction. Warning on the property declaration would be unintended and annoying 100% of the time, I think. That leaves option 2.

Rather than more straightforward errors relating to required properties, i.e. clear violation of code contract?

I'm not against the concept of required properties, but I do think they're orthogonal to this issue. Whether you have required properties or not, you still have to design what happens in the case that someone declares a non-nullable init property without a default value. My argument is that the natural decision in that case covers the specific gripe in the OP, which involves serializers.


@andre-ss6

It's also been mentioned that this would make init even more of a replacement for set. The solution to this issue should be one that is applicable for both set and init.

I don't think init is a replacement for set. set currently covers three cases: "must be initialized during construction", "should be set only during construction" and "can be set at any time". init takes care of one of those. Some sort of required set will take the other. set will still mean "can be set at any time": that's not going to go away.

Coming back to the specific issue in the OP, this issue is about the annoyance of unnecessary warnings when declaring POCOs used by serializers. init + careful design of nullable warnings covers this case, I think. I still think there's a separate issue around "must be initialized during construction" properties, but I think that's a separate issue, which will have its own relationship with NRTs.

EDIT: I got nullable and non-nullable mixed up

markm77 commented 4 years ago

@canton7 @HaloFour

What I believe you both are defending/proposing is effectively an implicit init required for the case of an init non-nullable reference type with no default value. I, on the other hand, am advocating an explicit init required.

I prefer an explicit declaration because it allows better bug catching by requiring intent and offers clearer error messaging to API consumers. I believe it will also make code easier to read (more declarative) and aid automated tooling and analysis. Finally it has the potential to cover additional problem areas such as the value type issue I mentioned.

But no problem to agree to differ. I respect you are looking for a simple solution with minimal change. My concern would be it makes a complicated language even more complicated.

HaloFour commented 4 years ago

@markm77

What I am proposing has nothing to do with additional language features or behavior. It has to do with the flow analysis of NRTs, which already has a major gap when it comes to POCOs. That same flow analysis should also apply to set.

required init is a different ballgame altogether. It does affect the compilers behavior since it would be an error to not initialize that property. But that doesn't prevent you from initializing it to null thus satisfying that requirement.

andre-ss6 commented 4 years ago

I don't think init is a replacement for set.

@canton7 Precisely, init is not a replacement for set. What you're proposing, however, would steer init further in that direction, as you would only get NRT warnings provided your property has a init accessor. No candy for set.

Please note I don't really have yet an opinion on this topic. I was just explaining to @HaloFour an adittional issue with his idea that was exposed by @333fred (correct me if I misinterpreted your point @333fred)

No? If a property is init, non-nullable, and doesn't have a default value, it's obviously intended that it should be set from the initializer.

(Emphasis mine)

@canton7 The consumer today has no way of knowing that a property has a default value. Yes, an attribute could solve that.

markm77 commented 4 years ago

@HaloFour

What I am proposing has nothing to do with additional language features or behavior. It has to do with the flow analysis of NRTs, which already has a major gap when it comes to POCOs. That same flow analysis should also apply to set.

I guess I am nervous of special-casing the flow analysis to solve one problem and in a way I think could be a head-scratcher for many people.

But that doesn't prevent you from initializing it to null thus satisfying that requirement.

Well if you set a non-nullable to null you should obviously get a compiler warning as per normal.

markm77 commented 4 years ago

Just to add @HaloFour , I sympathise. Using NRT with POCOs is hard work. I have in some cases had to make duplicate classes (with and without non-nullables) to work around issues.....

Basically I want to use NRT to tighten my code and be explicit with my types. I want to use initializers rather than constructors to offer friendly and flexible API syntax. I want to use immutable types and records where appropriate for data objects. But, even with C# 9 as announced, doing all these things together will not be fully possible. To my mind the biggest problems are solved via "required init" and "required set".

I suspect in the end you are having similar issues to me. Hopefully we can all find a solution.

I'm signing off so bye for now.

canton7 commented 4 years ago

@markm77

You seen to think that I'm advocating against required set. I am not, as I have said very clearly several times.

I am looking at the issue in the OP. Forget about required properties for a minute, the problem in the OP is not about them. The problem in the OP is that you get unwanted warnings when declaring POCOs meant for serialisation.

Required properties came up because they are a larger feature which might address the problem in the OP. Fine. I'm not arguing against them. But forget about that potential solution for just a minute.

init properties are coming in C#9. They will interact with nullable warnings, like it or not. In my opinion the only sensible way for them to interact also solves the specific problem in the OP. Happy days.

If that is the case, then we can go and design required properties independently of this issue.

andre-ss6 commented 4 years ago

You seen to think that I'm advocating against required set.

Actually I don't even know what it is that you're advocating for and I'm indifferent to it. I was just replying to your message, since you tagged me.

Forget about required properties

The discussion about required properties came up. I've chimed in. End of story.

Required properties came up because they are a larger feature which might address the problem in the OP. Fine.

And I was trying to explain to @HaloFour that his suggestion has another potential obstacle evidenced by @333fred.

canton7 commented 4 years ago

@andre-ss6 my bad, I thought you were part of that thread of conversation.

markm77 commented 4 years ago

@canton7 For the record I don't think you are against required init/required set. My understanding is you want a solution for C# 9 and believe HaloFour's to be realistic and without negative effects. Fair enough position. 😊

pablocar80 commented 4 years ago

Some of the examples discussed (string, booleans) have corresponding trivial values (empty string, false). The required init becomes more interesting for fields that are non-nullable references to classes.

canton7 commented 4 years ago

@markm77 it's more that I think this solution falls naturally out of the only sensible interaction between NRTs and init properties. That it is in the C#9 timeframe doesn't rely on a tricky bit of design is a bonus 😀 (and it will be tricky to make required properties and reflection-based serializers, which is what the OP references, work nicely together I think).

I'm also trying to point out that this issue and required properties are more or less two different things, and we can consider them separately.

Richiban commented 4 years ago

@canton7 NRT is not going to help you if the property is a value type though, is it?

canton7 commented 4 years ago

@Richiban We're talking about two different things. If the property is a value type, you won't get the nullable warnings that this issue is complaining about, so it's moot.

Required properties are different. It's correct that init properties + NRT don't cover the same cases as required properties.

The point I'm trying to make is that the specific warnings in the OP can, I think, be solved by init + NRT. Required properties are a separate (and worthy) issue, independent of the warnings which the OP is about, even though they were proposed as a solution to the warnings in the OP.

chucker commented 4 years ago

I am looking at the issue in the OP. Forget about required properties for a minute, the problem in the OP is not about them. The problem in the OP is that you get unwanted warnings when declaring POCOs meant for serialisation.

I don't think that's right. The problem in the OP is not that they get unwanted warnings. It's that they want the warnings to appear at the callsite.

So this:

public class C
{
    public string Str { get; set; } // produces warning
}
...
var c = new C { Str = "Hello World" };
var d = new C(); // doesn't warn, but should

Becomes:

public class C
{
    [CallerInitialize]
    public string Str { get; set; } // no longer warns
}
...
var c = new C { Str = "Hello World" }; 
var d = new C(); // warns

Notice that C might be in a completely different assembly that you don't have control over.

Suggestions to add the ! suffix or #nullable disable miss the point. Of course that's possible, but you lose compile-time safety. Instead, we're hoping for a way to signal to the compiler that the warning should surface when constructing an instance of C, not when implementing C itself.

canton7 commented 4 years ago

@chucker you didn't bold the words "When declaring POCOs", but they're important.

I agree with you: if you have a non-nullable init property, I'd expect a warning at the point of initialization if that property isn't set. Not at the point it's declared.

Please read my original post where I make this clear.

chucker commented 4 years ago

you didn't bold the words "When declaring POCOs", but they're important.

Why? POCOs are just one example. Razor Components are another. Settings/options types (I guess those are arguably a form of POCO?) are yet another.

I agree with you: if you have a non-nullable init property, I'd expect a warning at the point of initialization if that property isn't set. Not at the point it's declared.

I don't agree with that.

A property doesn't have to be immutable (init;) for me to care that it will be set at initialization. While those two might have overlap, they're not the same. Taking Yair's example again:

public class C
{
    [CallerInitialize]
    public string Str { get; set; } // no longer warns
}

var c = new C { Str = "Hello World" }; 
c.Str = "Hello again!!"; // is fine; mutability is a largely separate issue
olmobrutall commented 4 years ago

I'm using Signum Framework, and we can declare an entity once and serves as DB entity model, DTO, and view model at the same time.

We abuse using not-nullable properties to mean mandatory in the UI and not-nullable in the database but really they could be null while you are filling the form. Example: https://github.com/signumsoftware/extensions/blob/master/Signum.Entities.Extensions/Notes/Note.cs

We also use a lot of object initializers, not only because the ORM requires a default constructor (it could be private), but because when an entity has more than say three or four properties a constructor becomes harder to read (what was the parameter number 4???) and also is more code that have to be written and maintained.

I love NRTs but CS8618 is annoying. So much that we have disabled it https://github.com/signumsoftware/extensions/blob/87e9470a970307339926a478c2ee1b97578fcbdb/Signum.Entities.Extensions/Signum.Entities.Extensions.csproj#L13

I agree that the new init; property in C# 9 doesn't really fix this problem. init means you can only set it in the initialisier, not that you must set it.

I like the CallerInitialize per property (or field) solution, but it would be even better if we could write an attribute in the class with a general rule and make it inheritable:

[InitializeOptions(memberWarning=InitWarning.None,  callerWarning=InitWarning.NotNullableReferenceType)]
public class C
{
    public C(string name)
    {
       this.Name = name; 
    }
    public string Name { get; set; } //No warn because constructor
    public string Str1 { get; set; } // Warns in caller because not nulalble reference type
    public string? Str2 { get; set; } // No warns because nullable
}

new C("John"); //Warning because  Str1 is Not nullable reference type and is not initialized in constructor

The options that I can see are:

With this feature we could have more fine control about what type-safety we want for different hierarchies of objects and enjoy object initializers without feeling guilty for not writing the constructor.

For example for DTOs I would use [InitWarning(memberWarning=InitWarning.None, callerWarning=InitWarning.All)] to never forget setting a property, while for Entities I maybe use [InitializeOptions(memberWarning=InitWarning.None, callerWarning=InitWarning.All)] because they are very often created incomplete and given to the user to fill the missing values in the UI.

Of course, when using reflection to create the instance (Activator.CreateInstance) all the required properties are empty and up to be set by some run-time magic like a deserializer ignoring InitWarning.

Suchiman commented 4 years ago

@YairHalberstadt Should be closed as championed in #3630 ?