dotnet / csharplang

The official repo for the design of the C# programming language
11.05k stars 1.01k forks source link

Proposal: Semi-Auto-Properties; `field` keyword #140

Open lachbaer opened 7 years ago

lachbaer commented 7 years ago

Semi-Auto-Properties and field keyword

(Ported from dotnet/roslyn#8364)

Specification: https://github.com/dotnet/csharplang/blob/main/proposals/semi-auto-properties.md

LDM history:

scottdorman commented 7 years ago

This could also be useful/part of the implementation for https://github.com/dotnet/csharplang/issues/133.

HaloFour commented 7 years ago

My opinion is that this proposal makes auto-implemented properties more difficult to read by introducing these silent backing fields despite the use of logic. With expression-bodied accessor members a chunk of the boilerplate for standard properties already disappears. The IDE eliminates much of the rest via standard code snippets.

lachbaer commented 7 years ago

Suggestion:

And the following really looks better than with #133

public T PropertyInitialized
{
    get;
    set => value ?? new ArgumentNullException();
} = default(T);

instead of (with #133) - a bit much of backingField.

public T PropertyInitialized
{
    T backingField = default(T);
    get => backingField;
    set => backingField = value ?? new ArgumentNullException();
}
lachbaer commented 7 years ago

Instead of using a field keyword, I also suggest following syntax borrowed from C struct:

public string FirstName
{
    get;
    set => _name = value;
} _name;

_name is by definition private to it's type (maybe unless directly prepended by an access modifier?). OR just scoped to the corresponding property, what would even be better.

In case expression-bodied set of semi-auto-properties expect a value being returnd to assign to the backing-field (here with additional initializer):

public string FirstName
{
    get;
    set => value;
} _name = "(not set)";

This completely eradicates the need for a field keyword or yield return as suggested above, because now the backing field's identifier can be used

public string FirstName
{
    get;
    set {
        var pcea = new PropertyChangeEventArgs(_name, value));
        OnPropertyChanging( pcea );
        _name = value;
        OnPropertyChanged( pcea );
    }
} _name = "(not set)";

This would also go hand in hand with #133.

HaloFour commented 7 years ago

@lachbaer

With the declaration of the field being outside of the property block that would imply that the scope of that field would be for the entire class, not for just that property. That's also consistent with the scoping of that C syntax.

lachbaer commented 7 years ago

@HaloFour Sure, but in this case we could probably rely on IntelliSense not to spoil the rest of the module. Also without any prepending modifier it would imply internal access. (corrected, of course class members are private by default) πŸ™„ But nevertheless, I think that this is a neat way to achieve at least some of the goals. Also the costs of implementing it could be reasonable.

lachbaer commented 7 years ago

Of course specifying a getter only works, too:

public string FirstName
{
    get {
        Debug.Assert( _name != null, "Getting name before setting it?");
        return _name;
    }
    set;
} _name = null;

or combined:

public string FirstName
{
    get {
        Debug.Assert( _name != null, "Getting name before setting it?");
        return _name;
    }
    set => value ?? throw new ArgumentNullException();
} _name;

In the latter example, just specifying the _name identifier qualifies for a semi-auto-property and therefore for set => value .

jnm2 commented 7 years ago

@HaloFour

My opinion is that this proposal makes auto-implemented properties more difficult to read by introducing these silent backing fields despite the use of logic.

I don't disagree with your whole comment, but I wanted to point out that the difficulty reading is only if you're accustomed to thinking of logic and automatic backing fields as mutually exclusive. I don't think they should be. It's not hard to learn to look for a field keyword. It'll be right where you'll look if you're looking for the backing field anyway.

The reason I really like this proposal is not because it enables me to write less code, but because it allows me to scope a field to a property. A top source of bugs in the past has been inadequate control over direct field access and enforcing consistent property access. Refactoring into multiple types to add that desirable scope is quite often not worth it.

To that end, a field keyword makes perfect sense. I should not care or have to care what the backing field's name is. Ideally, it doesn't have one. This adds the convenience that renaming properties gets a lot easier.

Not infrequently I'm renaming constructs such as this, where Set handles INotifyPropertyChanged.PropertyChanged:

private bool someProperty;
public bool SomeProperty { get { return someProperty; } private set { Set(ref someProperty, value); } }

This kills two birds with one stone: 1) scope safety, 2) more DRY, less maintenence:

public bool SomeProperty { get; private set { Set(ref field, value); } }
gafter commented 7 years ago

/cc @CyrusNajmabadi

lachbaer commented 7 years ago

Meanwhile some time went by. I'd like to state my opionion on the questions from the inital post.

Allow both accessors to be defined simultaneously?

Yes, definitely. A nice example is the sample at the end of this comment. A semi-auto-property with an implicit field should be constructed under either or both these circumstances:

Assing expression bodied setters? and Assign block body with return?

I'd like to have that, because simply it looks nice and would totally fit into how "assign-to"-return expressions look.

But introducing to much new behaviour and especially having the compiler and reading user to differentiate between return and non-returning bodies can be confusing. Therefore I'd go with "no" on this currently.

Prohibit field keyword if not semi-auto?

No, but it must not be highlighted by the IDE in that case, because it that context it is no keyword anymore. I think it is very unlikely that somebody converts a semi-auto-property to an ordinary property and simultaneously has a 'field' named field in scope.

If property-scoped fields become availble shall that feature be available for semi-auto-properties as well?

Yes, if any possible. SAPs allow both, getter and setter, to be defined. It would make sense to make no difference versus normal properties to restrict that feature.

jamesqo commented 7 years ago

Taken from @quinmars at https://github.com/dotnet/csharplang/issues/681#issuecomment-308539756, this feature would allow devs to write a 1-liner to implement lazy initialization:

public T Foo => LazyInitializer.EnsureInitialized(ref field, InitializeFoo);

private T InitializeFoo() { ... }
mattwar commented 7 years ago

@jamesqo Except using this LazyInitializer method has a downside (unless the initializer method is static) because you'll be creating a delegate instance each time the property is accessed.

What you want to write is:

public T Foo => 
{
   if (field == null)
   {
        System.Threading.Interlocked.CompareExchange(ref field, InitializeFoo(), null);
   }

   return field;
};

And now its not really a one-liner.

jamesqo commented 7 years ago

@mattwar Well they said they intend to cache point-free delegates eventually. Also what if people don't care about extra allocations because the initializer is doing something like network I/O which completely dwarfs allocating a few extra objects?

lachbaer commented 7 years ago

public T Foo => LazyInitializer.EnsureInitialized(ref field, InitializeFoo);

However the field keyword should not be available to every property per se. It then should be decorated with a modifier, like e.g. public field T Foo ....

yaakov-h commented 7 years ago

@lachbaer why?

lachbaer commented 7 years ago

@yaakov-h For two reasons. First, so that you can see from the signature whether a (hidden) backing field is produced by the property. Second, to help the compiler finding out whether an automatic backing field shall be produced or not.

yaakov-h commented 7 years ago

First reason... I can see the value on a getter-only autoproperty like above, but on a get/set one I see little point.

Second reason I don't think is necessary. If the property's syntax tree contains any reference to an undefined variable named field, then you know it needs to emit an automatic backing field.

lachbaer commented 7 years ago

The alternative is to start thinking about properties a bit differently. Every property has an explicit (synthesized) backing field, unless it is opt'ed out, because it is not needed (no get; or set; and no field used). This little change in philosophy is even backwards compatible in (emmited) code.

jnm2 commented 7 years ago

I think a field property modifier is unnecessary verbosity. The keyword should just be available for use the same place you look at to see the backing field anyway.

mattwar commented 7 years ago

@jamesqo That delegate caching is for static delegates where they are not being cached in some circumstances today. It is unlikely that the InitializeFoo method is static.

jamesqo commented 7 years ago

@mattwar Then we can simply open a corefx proposal to add EnsureInitialized overloads that accept state? e.g.

public class LazyInitializer
{
    public static T EnsureInitialized<T, TState>(ref T value, TState state, Func<TState, T> factory);
}

public T Foo => EnsureInitialized(ref field, this, self => self.InitializeFoo());

And again, the extra allocations may be dwarfed by the initialization logic in some cases.

sonnemaf commented 6 years ago

Instead of introducing a new 'field' keyword you could maybe use the _ (discard) symbol. This would avoid conflicts in old code.

class Employee {

    public string Name {
        get { return _; }
        set { _ = value?.ToUpper(); }
    }

    public decimal Salary {
        get;
        set { _ = value >= 0 ? value : throw new ArgumentOutOfRangeException(); }
    } = 100;
}

I think this could work.

lachbaer commented 6 years ago

@sonnemaf _ = value; already is valid code (it evaluates the right hand expression and discards its result). Therefore it does not work.

lachbaer commented 6 years ago

Still there are some more questions to answer.

  1. What happens when an [inherited] field with the name field exists?
  2. What if #133 arrives and a property-scoped variable named field is declared?
  3. What if field was a delegate and a member method named field is available?
  4. If a local named field is declared, what about highlighting that keyword within the accessor?

When thinking about it some time, you can come up with quite a long ruleset that must be followed. By always adding the modifier field to the property, this feature is switched on by demand and any occurance of the identifier field within the accessors references the backing-field, shadowing all other members with that name.

yaakov-h commented 6 years ago

@lachbaer: I'd expect an existing variable, field or property named field would take precedence, otherwise backwards compatibility dies. field as a keyword would have to be conditional on nothing else having that name.

lachbaer commented 6 years ago

@yaakov-h It's not that easy. Just as one example imagine you have a class that inherits from a class that has a field member.

class A {
    protected int field;
    [...]
}

class B : A {
    public int Foo {
        get => field;
        set => field = value;
    }
}

Just by the looks of class B (without pure knowledge of A) you don't know what field refers to.

As a second example imagine that class B wasn't derived from class A in the first step and a refactoring made the programmer do so. Before Foo was a semi-auto-property, now no compiling error occurs, but A.field is referenced by the compiler.

As I already said, a mandatory field modifier eliminates all those cases that can break or at least complicate existing code. You can always target the member in a public field int Property by using this.field.

jamesqo commented 6 years ago

@lachbaer In that case, perhaps the compiler could refuse to make the changes and tell the user (s)he has to rename A.field first.

jnm2 commented 6 years ago

@lachbaer That is going to be such a rare scenario, it's a shame it has to make the common case worse for everyone.

Joe4evr commented 6 years ago

tell the user (s)he has to rename A.field first.

And what if A was in another assembly altogether? πŸ™ƒ

jamesqo commented 6 years ago

@Joe4evr Then the user wouldn't be able to use this feature in B.

Joe4evr commented 6 years ago

And that's exactly why @lachbaer is pleading the case for using a keyword public field T Foo.

Literally the same thing happened with async: Code before async/await could have had an identifier named await in scope, so in order to not break that, it's required to add async to the method declaration because that would've been unrecognized (and therefor illegal) code for prior compiler versions.

That said, I'm not entirely sure if this is the best way forward, but that is the case that he's making.

jamesqo commented 6 years ago

@Joe4evr I don't see how it would be worth adding field to every property declaration. As @jnm2 said, having a field identifier in scope will be extremely rare.

jnm2 commented 6 years ago

Maybe you could add field to the declaraion but you'd only have to if there was an accessible field named field.

jamesqo commented 6 years ago

@jnm2 That would be very confusing and would require more work in the compiler for such a rare case.

jnm2 commented 6 years ago

@jamesqo I tend to agree, but then I'm back to being sad that we have to have two field keywords with different meanings for the 99.999% (for real) case where there is no accessible field field. It'll be pure cognitive overhead IMO.

jamesqo commented 6 years ago

@jnm2 No, I meant there will be no field modifier. People will simply not be able to use this language feature if a variable field is in scope.

jnm2 commented 6 years ago

I'm happy with that.

lachbaer commented 6 years ago

That won't work either. If there already was an existing class that makes use of a field named field in a property then that won't compile any more, because the compiler cannot distinguish between an old property that has already worked and an updated code, where the property was added later and where field should reference the backing field.

I have another idea. When leaving expression-bodied properties aside, the property must have an initializer in order to support the field keyword. Otherwise field would simply be an ordinary identifier. In case of a further field member, a new warning is created.

class A<T> {
    T field;

    // old-style property
    public T Foo {
        get => field;
        set => field = value;
    }

    // semi-auto-property with getter and setter (either could be bodyless, e.g. 'get;')
    public T Bar {
        get => field;           // Warning: Member with name 'field' in scope
        set => field = value;   // Warning: Member with name 'field' in scope
    } = default;  // ( Initializer is mandatory for a generated backing field )

    // auto backing field NOT available for expression-bodied properties
    public T Krauts => field;
}

This, too, should mean no harm to existing code and don't have any ambiguities.

I'd be more than happy if somebody had an idea for getting the last 'Krauts' property to work with a generated 'field'.

jnm2 commented 6 years ago

That won't work either. If there already was an existing class that makes use of a field named field in a property then that won't compile any more, because the compiler cannot distinguish between an old property that has already worked and an updated code, where the property was added later and where field should reference the backing field.

I'm not sure what you're saying. I would expect it to compile and act the same as usual. If there is an accessible field named field, no backing field will be generated. Otherwise, a backing field will be used. Previous or later compilations of this class won't affect this logic because the only thing that is relevant is what the compiler can see at compile time.

lachbaer commented 6 years ago

@jamesqo wrote:

@jnm2 No, I meant there will be no field modifier. People will simply not be able to use this language feature if a variable field is in scope.

I was infering that an error would occur, that would be breaking. In the other case we go back to the beginning of this day to https://github.com/dotnet/csharplang/issues/140#issuecomment-309744338. It would be desastrous to have field one time working this way, under different conditions some other way.

I like the idea with the mandatory initializer best by now 😊

For expression-bodied props the IDE could help by suggesting to put this. (or base.) before 'field' in case there is such a member in scope. As @jnm2 said, in 99.999% (I slightly doubt this exact figure ;-) there would probably be no ambiguities at all.

jamesqo commented 6 years ago

@lachbaer

It would be desastrous to have field one time working this way, under different conditions some other way.

Why? This is how the new discard feature works: if a variable _ is in scope, then it is assigned to. Otherwise, _ is interpreted as a keyword. I'm proposing the exact same thing for field.

I'm not really comfortable with a mandatory initializer. I rarely initialize my fields. Forcing me to do so with = default each time would make this language feature awkward to use.

jnm2 commented 6 years ago

I slightly doubt this exact figure ;-)

Sure 😈 You know what would be terrifically cool? One of us with spare time should make a Roslyn indexing spider that crawls GitHub and other OSS repositories, parses the syntax, and indexes statistics so we have things handy like a list of all field names and their frequencies.

lachbaer commented 6 years ago

@jamesqo In contrast to _, that is in my eyes very, very unlikely to be used as a field name except maybe in volatile test projects, the probability of 'field' as the field name in a small class is much higher and conflicts are as sure as the pope staying a bachelor.

lachbaer commented 6 years ago

I gave it a thought and think that it would indeed be sufficient to let the compiler issue an 'information' that is supported by the IDE. It can be something like "You might want to add 'this.' to 'field', because a known member named 'field' is in scope". (BTW, can informations be turned into warnings by #pragma?)

Also in case of an appended initializer or empty-body-accessor (set;, get;), a usage of field would issue at least a warning that the field keyword is not available. This is not breaking, because those constructs do not exist in existing code and a manual intervention (falling back to plain old properties) is a way to solve it.

BobSammers commented 6 years ago

I think field should be created if there is an automatic get or a defined get that references it. A compiler warning should result when no assignment is made to it in either the setter or as an initialized value (including ctor initialization).

Unbodied accessors should implicitly be defined as:

get => field;
set => field = value;

which is the case currently if both accessors are unbodied.

I'm ambivalent on implicit assignment for an expression-bodied set.

Rationale A major reason to add bodies and backing fields to properties that would otherwise be automatic is to process new values in set (change notification is a huge use in UI code, for example). This proposal allows a body to be added to set only, avoiding boilerplate on get and a boilerplate backing field which is currently scoped outside the property. Reducing backing field scope is, in my opinion, the most compelling part of this proposal.

(Introducing a way of reducing scope for named backing fields, whether it's lachbaer's C-style convention or related feature requests where a normal field is defined inside the property definition, is less compelling for me because I can't see where it's useful to have a limited scope, named identifier vs a standard keyword: there is more to write, code complexity is (slightly) higher and refactoring is more difficult when there are two identifiers involved in one property. However, it does seem like a good way to avoid the potentially breaking change of creating a new keyword that could conflict with existing identifiers. The other proposal - to add a modifier such as field to the property declaration - also seems like it would be an unfortunate requirement but as has already been pointed out, it has history in the way async/await was introduced.)

Initialization Requirement I don't understand the proposition that the property should be initialized before the field keyword can be used. As I see it, the backing field is already there, invisibly, for automatic properties and this proposal makes it visible in situations where that could be useful. There is no assignment requirement on auto properties and therefore there shouldn't be here. I think the "hints" necessary for the compiler to infer that a backing field should be generated are sufficiently covered by explicit or implicit (i.e. unbodied) use in the get accessor.

Lachbaer suggested that an initialization requirement would assist in cases where field is an existing name in legacy code. I can't see the mechanism by which this helps, Sure, it reduces possible cases of collision to initialized properties, but I can't see a connection between use of an initializer and whether an automatic backing field is desired: it doesn't seem to me to actually solve the problem. I feel like I may be missing something here?

As an additional consideration, could there be issues with abstract properties where unbodied accessors are placeholders for definitions in child classes?

BobSammers commented 6 years ago

On reflection, I'd vote against implicit assignment for expression-bodied setters.

The common UI idiom of using a SetProperty() method (present, for example, in Prism's BindableBase) is an interesting real-world use-case for two reasons. For non-WPF coders, this method updates the backing field and fires a PropertyChanged event if the new value is different to the old. It would be used like this with a semi-automatic property similar to what has been proposed:

public bool UiProperty
{
  get;
  set => SetProperty(ref field, value);
}

Firstly, as SetProperty() updates the backing field itself and returns true / false to indicate whether the value was different, I think this is a good reason to avoid having the result of the expression assigned to the backing field. I'm sure there are plenty of other examples similar to this.

Secondly, the method signature of SetProperty() requires a ref parameter, which means that field would have to be implemented such that it can be passed in a method call with the ref keyword. Frankly, I expect this to Just Work (at the IL level, I expect the backing field to be an instance variable just like any other), but it is an interesting consideration.

ghost commented 6 years ago

Further extending @BobSammers example, why not just make ref Property to return reference to the backing field? I.e.

public bool UiProperty
{
    get;
    set => SetProperty(ref UiProperty, value);
}
jnm2 commented 6 years ago

@rndfax nice there but awkward if you just want to read the value of the backing field in the setter. You'd have to add var field = ref UiProperty; which would be pure boilerplate.

ghost commented 6 years ago

@jnm2

You'd have to add

Actually, no, you don't have to add anything. Just use system-provided helper methods:

T System.ReadField(ref Property);
T System.WriteField(ref Property, T val);

And usage:

public string Prop
{
    get => ReadField(ref Prop) ?? string.Empty;
    set => WriteField(ref Prop, value) ?? string.Empty;
}

This does not introduce any additional magic at all.

HaloFour commented 6 years ago

@rndfax

ref Prop is already legal syntax where Prop returns a ref type.