dotnet / csharplang

The official repo for the design of the C# programming language
10.95k stars 999 forks source link

[Proposal]: Field and value as contextual keywords #7964

Open MadsTorgersen opened 2 months ago

MadsTorgersen commented 2 months ago

Field and value as contextual keywords

Summary

This proposal explores introducing field as a contextual keyword instead of an implicit parameter, and changing value to similarly be a contextual keyword.

Motivation

As we're adding "Field access in auto-properties" to C# we've been aiming at having field be an implicitly declared parameter in property accessors, just as value already is in property set and init accessors.

Issue #7918 and subsequent discussion explores the breaking change aspects of this design, along with proposed default fixes.

Here we propose instead adding field as a contextual keyword. It would only be a keyword when used as a simple name inside a property accessor. This would result in fewer breaking changes and simpler "default fixes" for those breaks that do occur.

Declarations of field as a local variable would no longer be a breaking change:

public int P
{
    get
    {
        var field = 0; // Allowed today, breaks with implicit parameter, allowed with contextual keyword
    }
}

This would eliminate the arguably most gnarly automated default fix, in that such locals would have had to be renamed, leading to a non-obvious choice of which name to change them to.

For the remaining breaks, the implicit parameter approach involves changing existing occurrences of field as a simple name to a member access, possibly rewriting primary constructor parameters to introduce an explicit field declaration. Instead, with field as a contextual keyword all those occurrences can simply be replaced with @field.

The proposal also suggests changing value to a contextual keyword for consistency. This is a breaking change, but likely a very small one occurring only in niche situations. The default fixes for those are simple, changing value to @value or vice versa.

The idea for this proposal came from this comment on issue #7918.

Detailed design

Contextual keywords

Whenever field or value occurs as a simple_name within a set, init or (for field) get accessor of a property, it is considered a contextual keyword. This allows code to refer both to the special parameters (using field and value) and to the nearest explicit declaration of that name (using @field and @value).

From a specification point of view, we would make the following changes:

Note that this narrowly treats field and value as keywords only when used as simple names. It does not prohibit declarations of local variables and parameters with those names within applicable accessors, but access to those variables and parameters would have to use @.

Some existing contextual keywords are reserved throughout a given region of code, e.g. await expressions, which reserve the await identifier throughout the body of an async function. Others are only keywords in a very specific syntactic position, e.g. set and get. This proposal is a combination of both those two flavors: field and value are only keywords when used as simple_names and even then only within applicable property accessor bodies.

Breaks and fixes

field as a simple name within the bodies of existing property accessors: These occurrences would now be contextual keywords and denote a field_access. This break occurs no more or less frequently than the equivalent break with the implicit-parameter design. The default fix is to change this occurrences to @field.

The fix is significantly simpler than the default fix in the implicit-parameter design, which is to replace with a member access. The receiver of that member access would vary with the specifics - this or a type - and also may lead to generation of a new field, when field is a captured primary constructor parameter.

The proposal avoids breaking changes to declarations of locals or parameters named field within a property body, since those would continue to be allowed with the same meaning. However, see "Alternatives" below.

value as a simple name within the bodies of existing property set or init accessors: As long as value already refers to the implicit parameter of the accessor this is not a break. However, if it refers to a different variable it would now be a contextual keyword denoting a value_access. Current C# rules prevent shadowing of parameters and local values except within nested - local or anonymous - functions, so this break can only occur within a nested function of the accessor. Furthermore, the programmer will have had to deliberately declare a parameter or local variable with the name value, despite the fact that it shadows the implicit parameter to the property and prevents access to it. All in all, while this is certainly possible, it is likely very rare. The fix is to replace with @value.

nameof(value) within the bodies of existing property set or init accessors: This is an exception to the previous category. nameof accepts a simple_name, so value would now be a keyword, but a keyword is not allowed here. Adding an @ wouldn't help, since value may no longer be declared as an identifier. The fix is to replace with the "value" string itself. However, see "Alternatives" below.

@value as a simple name within the bodies of existing property set or init accessors: If @value currently refers to the implicit parameter of the accessor, it will no longer do so. Instead it will denote the nearest explicit declaration of value, yielding an error if not found. Either is a break. It seems unlikely that existing code would prefix value with @, since it does not currently change the meaning in any way, so this is probably quite rare. The fix is to replace @value with value.

Here are representative examples of the breaks and fixes:

class C
{
    int field = 0;
    int value = 1;

    public int P
    {
        set
        {
            field = value;   // Break! Replace 'field' with '@field'
            @field = @value; // Break! Replace '@value' with 'value'

            this.field = this.value; // Not a break;

            var s1 = nameof(field); // Break! Replace `field` with `@field`
            var s2 = nameof(value); // Break! Replace `nameof(value)` with `"value"`

            void F(ref int field, ref int value) // Not a break
            {
                field = 2; // Break! Replace 'field' with '@field'
                value = 3; // Break! Replace 'value' with '@value'
            }
        }
    }
}

Drawbacks

Different breaks: Compared to the field-as-implicit-parameter approach, the proposal avoids a class of breaks around field (local variable declarations within the accessor). On the other hand it introduces new breaks around value, though they are arguably much more rare.

More churn: By subtly changing the semantics of value the proposal arguably introduces more conceptual churn, even if the actual breaks might be fewer. On the other hand, many people probably already think value is a contextual keyword, since it is "just there" without declaration and actually colors as a keyword in popular editors.

Alternatives

Stick with implicit-parameter design: We could choose to keep the current design. We'll need to decide which names to generate for the default fix for local variable declarations of field.

Don't change value: We could choose to adopt only the field part of the proposal. This is a bit less breaking, while introducing a subtle difference between the workings of field and value.

Broaden the context: The fact that field and value are still allowed as identifiers within applicable property bodies except as simple names might allow for confusing scenarios. On the one hand it is probably nice that this.field doesn't break. However, is it weird that you can declare var field = 4; in your accessor body, yet you need to say @field to refer to that variable?

Any broadening of the context would lead to further breaks, so there's a very significant trade-off there. However, a sensible landing point might be for local and parameter declarations of field and value to "refer to the keyword" and thus be an error, similar to how var this = 5; is an error today. This feels like it targets all uses (declare and consume) of local variables/parameters equally, for a more balanced/less confusing experience. It would reintroduce breaks around variable declarations similar to the "implicit-parameter" design, but the fixes remain simple: add @ in front.

Handle nameof gracefully: We could specially allow nameof(value) and nameof(field) on the keywords, producing the obvious strings as a result.

Unresolved questions

No known open questions at this point.

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-03-04.md#breaking-changes-making-field-and-value-contextual-keywords

Issue #7918 was discussed here:

CyrusNajmabadi commented 2 months ago

It would only be a keyword when used as a simple name inside a property accessor.

Does 'simple name' just mean 'pure identifier'? Or does it include type arguments, a-la field<int>?

CyrusNajmabadi commented 2 months ago
var field = 0; // Allowed today, breaks with implicit parameter, allowed with contextual keyword

Note that this narrowly treats field and value as keywords only when used as simple names. It does not prohibit declarations of local variables and parameters with those names within applicable accessors, but access to those variables and parameters would have to use @.

fwiw, i don't love this. We don't allow int await; in an async method. So i'd prefer that within a property we simply disallow this and make field always a keyword. If we did this, an easy workaround would be do to var @field = 0;.

CyrusNajmabadi commented 2 months ago

Whenever field or value occurs as a simple_name within a set, init or (for field) get accessor of a property, it is considered a contextual keyword.

We might want to do this for event accessors with value for consistency.

CyrusNajmabadi commented 2 months ago

are only keywords when used as simple_names and even then only within applicable property accessor bodies.

We should be clear and explicit as to what extent those bodies extend to. For example, do they apply once you're inside a lambda/local funciton within an accessor?

Since the spec allows for int field to be an inner lambda/local-function parameter, we should still be very clear if that means that code within those lambdas/local-functions needs @field to reference that parameter, or if within the lambda you can go back to field.


Note: the hybrid approach again slightly unnerves me. Primarily because i think getting it right will be hard. Here are examples of places we'd need to still allow field (not meaning the keyword).

Foo(field: 0);
new { field = 0 };
// etc.

My compiler hat makes me just want this to always be a keyword (like await). And if you really want to use it as an identifier, you always use @. i think that's simpler, and much more in line with async/await (which i actually like as a model ehre).

CyrusNajmabadi commented 2 months ago

The proposal avoids breaking changes to declarations of locals or parameters named field within a property body, since those would continue to be allowed with the same meaning.

True. But as the fix there is trivial as well, i would ask if that's important enough. Def something to get a read of people on. Personally, i think taht code would now look weird. If field is a contextual keyword, defining something called field would be an anti-pattern/smell. It might be a virtue to push people away from that.

CyrusNajmabadi commented 2 months ago

I'm 100% in favor of taking this approach. My preference is to tweak slightly. But that tweak is def optional, and this proposal is great even without that :)

CyrusNajmabadi commented 2 months ago

Implementation note: We were able to squeeze out a few more bits to store on nodes. This means we can store the contextual information about where value or field was parsed in. That means we dont' have an incremental parsing issue here.

Specifically, we need to make sure that there is no issue transforming:

void Foo()
{
     field++;
}

into

void Foo
{
    get
    {
        field++;
    }
}

That node (and its parents) for the field in the method cannot be reused in the property (and vice versa). It is actually lexed parsed differently, and we must rerun the parser in the new context to get this right.

MadsTorgersen commented 2 months ago

Does 'simple name' just mean 'pure identifier'? Or does it include type arguments, a-la field<int>?

Good point of clarification, thanks. I don't think it has real-world impact, but we do need to decide one way or the other.

MadsTorgersen commented 2 months ago

Note that this narrowly treats field and value as keywords only when used as simple names. It does not prohibit declarations of local variables and parameters with those names within applicable accessors, but access to those variables and parameters would have to use @.

fwiw, i don't love this. We don't allow int await; in an async method. So i'd prefer that within a property we simply disallow this and make field always a keyword. If we did this, an easy workaround would be do to var @field = 0;.

I understand, and the "Broaden the context" alternative was intended to acknowledge that issue. With await we had the luxury of a brand-new context that hadn't existed before, so there was no breaking change impact. Here, every statement or expression form we break is more impact on the user who has existing code.

That said, perhaps the parameter and local declarations wouldn't be so bad to break. Presumably, if you have such a declaration in existing code, you'd also 9 times out of 10 have at least one usage of the declared variable that would also be broken. So it'd be rare for an otherwise "clean" property to break only because of declarations. Once you're in there fixing, you would just fix the declaration too (or the automated fix would).

MadsTorgersen commented 2 months ago

are only keywords when used as simple_names and even then only within applicable property accessor bodies.

We should be clear and explicit as to what extent those bodies extend to. For example, do they apply once you're inside a lambda/local function within an accessor?

Yes they do. If that is not clear enough, it should be! ;-) I don't think the await reasoning applies here; in a nested function in an async method you are genuinely outside the async context. Not so for properties, where field and value continue to be meaningful inside nested functions.

Since the spec allows for int field to be an inner lambda/local-function parameter, we should still be very clear if that means that code within those lambdas/local-functions needs @field to reference that parameter, or if within the lambda you can go back to field.

According to this proposal, field as a primary expression in nested functions continue to be the keyword. You need @ to make it an identifier - all the way down.

Note: the hybrid approach again slightly unnerves me. Primarily because i think getting it right will be hard. Here are examples of places we'd need to still allow field (not meaning the keyword).

Foo(field: 0);
new { field = 0 };
// etc.

My compiler hat makes me just want this to always be a keyword (like await). And if you really want to use it as an identifier, you always use @. i think that's simpler, and much more in line with async/await (which i actually like as a model ehre).

I get the compiler-hat thinking. I just worry it's too breaking.

jnm2 commented 2 months ago

Since args in top-level programs is so similar to value in properties, can that be kept consistent as well?

ufcpp commented 2 months ago

If value is a keyword, how is nameof(value) handled?

    public int X
    {
        set
        {
            var n1 = nameof(value); // no error
            var n2 = nameof(this); // CS8081
            var n3 = nameof(int); // CS1525
        }
    }
jnm2 commented 2 months ago

That's a really good point, since nameof(value) is routinely passed to ArgumentException in setters.

HaloFour commented 2 months ago

That's a really good point, since nameof(value) is routinely passed to ArgumentException in setters.

That feels reasonable to me. But, what should nameof(field) do? I could see that being an error since the underlying field would be unspeakable.

ufcpp commented 2 months ago

Would that problem be solved if it is recommended to use ArgumentNullException.ThrowIfNull? ThrowIfNull accepts the keyword e.g. ArgumentNullException.ThrowIfNull(this).

jnm2 commented 2 months ago

@HaloFour I agree that nameof(field) probably shouldn't work because it's not clear why you'd use it. As a side note, nameof does not return the metadata name of the thing, but the name being used in C# to refer to the thing. An example where this can be seen is generic class names.

@ufcpp There would have to be throw helpers for every overload of every argument exception type, like ArgumentOutOfRange, and potentially user-defined argument exceptions.

KalleOlaviNiemitalo commented 2 months ago

I suppose this change would not affect the processing of cref and paramref attributes in XML documentation comments, because neither C# Annex D nor Roslyn allows documentation comments within a property_body, where field or value could be a contextual keyword.

MadsTorgersen commented 2 months ago

I added a few things to the proposal text ahead of Monday's LDM, primarily based on feedback here on this issue - thanks! ❤️

I added:

jnm2 commented 2 months ago

@MadsTorgersen Could we consider treating args in top-level programs the same as value, in regard to the proposed changes, since they seem to currently inhabit the same bucket?

colejohnson66 commented 2 months ago

I do not like this. In my opinion, the team's obsession with using contextual keywords instead of breaking syntax is wrong. It causes sharp edges (nameof(int) just flat-out doesn't work, and neither would nameof(Int32) if nameof exists in scope). These "contextual keywords" are actually worse because work is hoisted onto the compiler to determine if a word means one thing or another based on the context of the name scope.

Rust shows that breaking changes can be done safely through editions, and we already have such a thing: the LangVersion compilation property. Roslyn is already capable of not allowing certain syntax based on LangVersion. Just add LangVersion=6 and, suddenly, all file-scoped namespaces are illegal.

In addition, there's talk here already about how this "contextual keyword" isn't actually contextual, but would break things. If I have LangVersion=Latest, upgrade to .NET 9, and my code breaks, how is that a good thing? It's not. The old proposal to lock Latest at C# 12 was better, as it forces the user to opt in to new constructs.

CyrusNajmabadi commented 2 months ago

It causes sharp edges (nameof(int) just flat-out doesn't work

That wouldn't work no matter what approach we took. In other words, nameof(int) doesn't work not because nameof is a contextual keyword. But because we think nameof should only operate on names (and int is not a name).

I do not like this. In my opinion, the team's obsession with using contextual keywords instead of breaking syntax is wrong

I don't get how the two clauses relate. You say you do not like what is being proposed here, and that we should break syntax instead. But that's very much what this proposal does. It breaks syntax. Existing code changes meaning (there's a breaking change), and you would have to change your code then to get the old meaning.

colejohnson66 commented 2 months ago

You ignored my point about nameof(<type>). If nameof is in scope at all, I can't use the nameof "operator". Sure, "no one" is writing a method called nameof, but the point remains.

As for how they relate: I also understand that this proposal includes breaking changes. My gripe is that syntax evolution tends to be halted in the name of backwards compatibility. How long has this issue been requested? Eight years (3.5 if you count the age of semi-auto-properties.md). After all this time, breaking changes are decided to be a good thing(tm), but instead of actually solving the problem, rules are created to still keep things contextual instead of going just all the way.

CyrusNajmabadi commented 2 months ago

Sure, "no one" is writing a method called nameof, but the point remains.

I don't know how the point remains, when you're both stating but then discarding the most important part.

My gripe is that syntax evolution tends to be halted in the name of backwards compatibility

The point of this proposal is that that's not the case. We're literally considering syntax evolution, achieved through breaking back-compat.

but instead of actually solving the problem, rules are created to still keep things contextual instead of going just all the way.

I don't know what that means. This proposal literally breaks the syntax here to make this possible. I don't know what you mean by "going all the way" here.

HaloFour commented 2 months ago

@colejohnson66

nameof was designed at a different time, where the team was much less willing to accept breaking changes. For this particular case with field the team is willing to wade into making those breaking changes to better ensure that the feature has better ergonomics and more consistency with the existing value "keyword".

If nameof was being designed today perhaps that discussion would be different. However, that doesn't mean that nameof(int) would just work, as the compiler would have to have explicit rules as to how to handle that case. Beyond the contextual keyword consideration around potential existing nameof methods, the team also didn't want to spin out to a lot of completely new forms of syntax just to support nameof, and making it smell like a method invocation was done more to facilitate keeping the syntax simple than anything to do with keywords or breaking changes.

CyrusNajmabadi commented 2 months ago

@HaloFour Right. To us, nameof takes named things, and keywords are not named things. Neither are other things that typeof is fine with. For example, typeof works fine on arrays (like typeof(int[]). But int[] is not named as far as the language is concerned, and we would not want that to work. Just because they both are xxxof(...) operators doesn't mean they should accept the same set of values.

HaloFour commented 2 months ago

@CyrusNajmabadi

I still want my methodof(M(int, string)) 😸

CyrusNajmabadi commented 2 months ago

I have high confidence that given enough time, we will have infoof.

HaloFour commented 2 months ago

@CyrusNajmabadi

In Foof we Trust.

Youssef1313 commented 2 months ago

If field is only referenced in nameof, should it be disallowed?

public string P { get => nameof(field); set => /*something that doesn't use 'field'*/ }
jnm2 commented 2 months ago

It would make sense to me to disallow nameof(field) since it's a keyword, just like how nameof(class) and nameof(int) don't work. nameof(value) or nameof(@value) are not like nameof(field) in this way. They're differentiated by a specific pattern: what to do when property setters do throw ArgumentException (and they do in the BCL). The existing pattern is to pass a paramName of nameof(value). ParamName refers to the metadata parameter, not to other things like properties.

GeirGrusom commented 1 month ago

It would make sense to me to disallow nameof(field) since it's a keyword, just like how nameof(class) and nameof(int) don't work

Constraints are easier to lift than to add also, but I wouldn't think there's too much use for it.

I seem to remember that nameof(int) did work at one point, but that people were upset that it returned Int32 rather than int. Would it return field or __<field00001>_bishibosh_$$$? First one seems completely useless, and the second could be used in reflection clown show code.

wrexbe commented 3 weeks ago

How about making properties into a scope under the class? Maybe even allow methods in them. It's a lot more flexible. I like the idea of being able to structure a class in a way that says, this stuff is only for this property. It is similar to how we can scope a method to only be for a method.

public int P
{
    int _field;
    get => _field;
    set => _field = value;
}

Pros:

  1. Breaking changes? Nope
  2. What does nameof(_field) do? it returns "_field", or whatever name you gave it
  3. What if you want a different backing field type? You declare the field a different type
  4. What if you want more then one backing field? Go for it
  5. What if you want some kind of conversion, or notify method specific to this property? You add it to the properties scope.
  6. What if you want to copy the property, and rename it? Go for it, the names of everything inside can stay the same.
  7. Fields, and methods inside don't need accessibility modifiers, they are private to the property.
  8. It can still be very small, if that is a goal public int P {int f; get => f; set => f = value;}
  9. If not repeating the type is a goal, var could implicitly become the property type if it's used as a field in the property public int P {var f; get => f; set => f = value;}
CyrusNajmabadi commented 3 weeks ago

@wrexbe that is already proposed at https://github.com/dotnet/csharplang/issues/133.