dotnet / csharplang

The official repo for the design of the C# programming language
11.4k stars 1.02k forks source link

[Proposal]: Extending patterns to "as" #8210

Closed IS4Code closed 1 week ago

IS4Code commented 3 months ago

Extending patterns to "as"

Summary

This proposal seeks to extend pattern matching to the as operator in a manner analogous to is, to allow x as <pattern> achieve an effect similar to x is <pattern> y ? y : null, or to explore ways to arrive at a similar outcome.

Motivation

A lot of the language is built around pattern matching and option-like null propagation or guards (?. and ??), yet these two systems don't go that well together. For example:

Class(string? a, string? b)
{
    this.a = a != null ? a : "<unspecified>";
    if (b == null) throw new ArgumentNullException(nameof(b));
    this.b = b;

    // can be easily simplified as

    this.a = a ?? "<unspecified>";
    this.b = b ?? throw new ArgumentNullException(nameof(b));
}

Yet in situations where null is not the only "exceptional" value, there is no syntax to simplify to:

Class(string? a, int b)
{
    this.a = a is not null or "" ? a : "<unspecified>";
    if (b < 0) throw new ArgumentOutOfRangeException(nameof(b));
    this.b = b;
    // both "a" and "b" still need to be referenced twice
}

In many situations, there are special values outside of null that need to be checked ‒ empty strings or collections, 0, -1, or SomeEnum.None, or "invalid" objects, but there is no easy way of discarding these values other than using conditions.

What if we were instead able to leverage the existing null-centered syntax to "coerce" the unwanted values to be null? Is there any piece of syntax with the meaning "make sure this is X, otherwise be null"? Actually, yes ‒ as!

Detailed design

This proposed syntax aims to extend as, consistently with its original relation to is, to support (non-assigning) patterns. Roughly speaking:

x as P

would be more or less equivalent to

x is P y ? y : null

This is a natural extension for the original syntax x as T, now reinterpreted as x is T y ? y : null. The operation corresponds to "match x to P, or produce null if unmatched".

Concretely, this would parallel is in the following situations:

is syntax is semantics as syntax as semantics
x is T type check x as T unchanged (cast to T?, or null on failure), possibly extended to allow non-nullable value type T
x is T { ... } type and properties check x as T { ... } cast to T?, ensuring the properties, or null on failure
x is { ... } properties check x as { ... } cast to the nullable equivalent of the type of x, ensuring the properties, or null on failure
x is T and P type and P check x as T and P cast to T? (or a more specific applicable type from P), ensuring P, or null on failure
x is P P check x as P cast to the nullable equivalent of the type of x (or a more specific applicable type from P), ensuring P, or null on failure

The pattern may not create any new variables, since otherwise their use would require the check that the result is not null.

Examples of semantics

object? o;
string s;
byte b;

Examples of usage

The code above could be easily simplified to:

Class(string? a, int b)
{
    this.a = a as not "" ?? "<unspecified>";
    this.b = b as >= 0 ?? throw new ArgumentOutOfRangeException(nameof(b));
}

Other possibilities include:

Drawbacks

The new interpretation of as might need some taking used to, to communicate the intended meaning.

Alternatives

There are possibilities for an alternative syntax, not depending on as, with a similar effect:

Class(string? a, int b)
{
    this.a = a when not "" else "<unspecified>";
    this.b = b when >= 0 else throw new ArgumentOutOfRangeException(nameof(b));
}

In this case, one would need to be explicit about the introduction of null, e.g. by writing b when not >= 0 else null.

Unresolved questions

When T is a non-nullable value type, x as T is currently prohibited. While it possible to achieve a similar effect whilst keeping this restriction, it might make sense to allow producing Nullable<T> instead in such a situation. In any case, i as >= 0 for an int i would need to produce int? to be usable, unless an alternative syntax is picked that requires one to specify the "otherwise" value upfront.

Design meetings

julealgon commented 3 months ago

How would you reconcile this with all the guard methods that were recently introduced, such as ArgumentOutOfRangeException.ThrowIfNegative and such?

The reason they exist is to avoid the whole nameof(param) thing (which by itself is error prone), and to centralize the various message variations (there are many for ArgumentOutOfRangeException).

By using the old ?? throw new... throw expression syntax, you just lose all the benefits of the native guard methods.

I don't disagree with what you are proposing in essence (I've built something similar using .As extension methods actually before), but at the same time I don't want to lose the great benefits that the guard methods provide.

I've actually completely replaced all of these in our codebase with the guard methods recently. For example:

this.user = user ?? throw new ArgumentNullException(nameof(user));

With:

ArgumentNullException.ThrowIfNull(user);
this.user = user;

I don't think approving this with the intent of using it for argument validation is a great idea without also reconsidering all the guard methods: an ideal solution would handle both things.

The ultimate approach to me would be to properly add contracts to the language and allow this (which has been discussed a lot in multiple issues by now):

public MyClass(string user, int value)
    where user is not null
    where value is >= 0 and < 100
{
    this.user = user;
    this.value = value;
}
CyrusNajmabadi commented 3 months ago

By using the old ?? throw new... throw expression syntax, you just lose all the benefits of the native guard methods.

That seems like a runtime issue. This pattern woudl be fairly trivial/idiomatic. So it could be detected and converted to the same IL that would be emitted for calling into the helpers.

IS4Code commented 3 months ago

It's a good point that there are these methods to use, but there also situations where you might not want to produce an exception, or the existing guard methods don't cover all patterns you might want.

There is also a potential of another neat way of parameters checking, using the proposed syntax:

this.x = ArgumentOutOfRangeException.ThrowIfUnmatched(x as >= 0 and <= 255);
julealgon commented 3 months ago

By using the old ?? throw new... throw expression syntax, you just lose all the benefits of the native guard methods.

That seems like a runtime issue. This pattern woudl be fairly trivial/idiomatic. So it could be detected and converted to the same IL that would be emitted for calling into the helpers.

I don't follow you here @CyrusNajmabadi . How would code like:

this.b = b as >= 0 ?? throw new ArgumentOutOfRangeException(nameof(b));

Avoid having to pass in the explicit exception type, the nameof(b) and the specific error message for the >= validation (all of which are benefits of using the guard methods)?

Or perhaps you are suggesting something like this (which would be very interesting if possible)?

this.b = b as >= 0 ?? throw;

In this case, the compiler could see the various trivial comparisons and pick the correct exception type and message. Is this what you are alluding to?

CyrusNajmabadi commented 3 months ago

By using the old ?? throw new... throw expression syntax, you just lose all the benefits of the native guard methods.

That presumes that the specific corresponding native guard method exists. And, if they do, that means they have a specific code pattern in them. For example if arg is null throw new ArgNullException(...).

I'm saying, if your callsite has effective the identical pattern that calling the guard method would emit, then the runtime can/should treat them uniformly. There should be no penalization that you happened to call one vs the other.

julealgon commented 3 months ago

By using the old ?? throw new... throw expression syntax, you just lose all the benefits of the native guard methods.

That presumes that the specific corresponding native guard method exists. And, if they do, that means they have a specific code pattern in them. For example if arg is null throw new ArgNullException(...).

I'm saying, if your callsite has effective the identical pattern that calling the guard method would emit, then the runtime can/should treat them uniformly. There should be no penalization that you happened to call one vs the other.

Oh I see what you mean now @CyrusNajmabadi , but that was not my point. I wasn't saying one would miss "functional" benefits by using a different syntax, but maintenance-related ones.

For example, having to provide the nameof(argument) to the exception is something that is not needed when using the guard methods as they rely on [CallerMemberExpression] to automatically grab the argument name. Not having to provide that is good, as it reduces the chance of mistakes happening (using incorrect values, or introducing a typo when using raw strings etc).

Similarly, each guard method has a built-in message for the ArgumentException and ArgumentOutOfRangeException types, based on the method that is being called. By using those exceptions manually, one needs to then provide a message on every call site to get "equivalent" behavior as the guard methods. Maintaining those messages is also an additional burden, and another source for mistakes to happen.

And lastly, the guard methods also take care of choosing the more appropriate exception type and message based on the underlying checks being done. For example, using ArgumentException.ThrowIfNullOrWhiteSpace will throw 2 distinct exceptions each with their own bespoke message, depending on whether the value passed in is null (ArgumentNullException), or empty, whitespace, etc (ArgumentException + custom message).

This kind of validation is harder to capture using a single expression, especially if using the syntax proposed here, as there is no way to throw different exceptions using a single throw expression after a coalescing operator. So one would need to go down into a switch statement instead, which brings its own issues such as exhaustiveness checks.

My overall point is that any new feature that influences how argument validation is done should seriously consider how the existing guard methods will play with them, even if the answer is to deprecate the guard methods altogether.

I would love to have something like I mentioned above though, which would be a combination of both worlds:

this.b = b as >= 0 ?? throw;

If the compiler can understand the validation pattern being applied, it would pick the correct exception and message to throw automatically. If it cannot understand it, it would signal this with a compilation error.

The only caveat with an approach like this of course is how to make it extensible (and if that's even desirable in the first place). This would be a bit less complex (and less flexible) than full-blown contracts, but still much better than what we currently have.

HaloFour commented 3 months ago

Given this feature is not specific to argument validation I don't think it makes sense to try to bake that much "magic" into the language specification for this feature.

julealgon commented 3 months ago

Given this feature is not specific to argument validation I don't think it makes sense to try to bake that much "magic" into the language specification for this feature.

I don't disagree (what I said above would likely become a separate proposal), but at the same time, the main arguments presented in favor of this proposal here are tied directly to argument validation.

I think they should be considered in conjunction to avoid disrupting existing argument validation patterns and creating clashing standards.

For a very long time, argument validation wasn't strongly standardized. With the introduction of the native guard methods, this has now changed, as most common validations are covered there. With the introduction of extension types in .NET9, it will also be possible to introduce extra guard methods using the exact same pattern as the native ones.

Using this as extended proposal for going back to manual throw expressions would be a regression on that effort IMHO.

CyrusNajmabadi commented 3 months ago

I eprsonally disagree. Using throw is normal and idiomatic. If there is something common enough it could benefit from a lib, then put in a lib. But that seems like the very narrow case. The runtime seems to only have a tiny handful of these, just for the cases where they cared about perf enough for thsi to be relevant. I would not extend that out to any sort of pattern beyond that.

hez2010 commented 3 months ago
x is T and P type and P check x as T and P cast to T? (or a more specific applicable type from P), ensuring P, or null on failure

This doesn't follow today's is semantic where it allows you casting x to both T and P:

if (x is T t and P p) {
  // use t and p here
}

Today we are not allowed to do

if (x is (T and P) tp) // error, waiting for union/intersection types coming in future

Allowing as T and P today would block the future of union/intersection types in C#, as ideally x as T and P should result in type T and P (T & P).

IS4Code commented 3 months ago

@hez2010 Good point! It would be better to disable P from changing the type of the result in that case.

IS4Code commented 3 months ago

Actually, to provide more thoughts, the reasoning behind as T and P was intended primarily to make something like as IComparable and > 10 turn into int? (since > 10 in this situation is an implicit int pattern), i.e. the first type should not determine the result alone.

To refine on this, I think it is sufficient if the pattern is restricted (at the moment) to always result in an identifiable type. Practically, this means that as T1 and T2 is fine only if T1T2 or T2T1, and T1 or T2 is fine only if T1 = T2 (so it is pretty much useless, but is allows it too, so why not). In the future, the other cases could be turned into T1 & T2 or T1 | T2.

TahirAhmadov commented 2 months ago

I think these type unions and intersections should be prohibited in the proposed as syntax for now. It's always easier to prohibit now, then allow later once proper design is done for it.

julealgon commented 2 months ago

It's always easier to prohibit now, then allow later once proper design is done for it.

Not only that but it also allows for faster development of the rest of the design and gets some useful additions out of the door without having to develop (and especially discuss/define) the whole thing completely which as we know can take years sometimes.

I particularly like when it is possible to implement something incrementally because of that benefit. It is basically what happened (and is still happening) with pattern matching.

bbarry commented 2 months ago

Is there a discussion open for this item? I feel like most of the comments here would benefit from the formatting and associations that a discussion provides.

IS4Code commented 2 months ago

@bbarry Yes, #8207 is what this started as.

Thaina commented 2 months ago

Interesting idea But I don't like the way that we use as >= 0 and expect int? because the syntax should match float and any object that implement interface IEqualityOperators<int>

IS4Code commented 2 months ago

@Thaina I don't like it either, but that is how patterns work in general ‒ x is >= 0 checks for int.

hez2010 commented 2 months ago

@Thaina I don't like it either, but that is how patterns work in general ‒ x is >= 0 checks for int.

It's using targeted type. If x is float it will match against float as well.

glen-84 commented 2 months ago

My down-vote is because I don't find:

this.a = a as not "" ?? "<unspecified>";

... very readable, and it also feels weird to use the null-coalescing operator in this way.

I prefer the alternative syntax. I even wonder if it's technically possible to shorten the 2nd example to:

this.b = b when >= 0;

... and automatically throw an ArgumentException when the pattern doesn't match.

Although in this case, when might not be the right keyword.

AFatNiBBa commented 1 month ago

I like this proposal a lot, these are my considerations:

glen-84 commented 1 month ago

To elaborate:

this.b = b when >= 0 else throw new ArgumentOutOfRangeException(nameof(b));

Spoken: Assign b to this.b when [b] is greater than or equal to zero, else throw a new ArgumentOutOfRangeException.

this.b = b as >= 0 ?? throw new ArgumentOutOfRangeException(nameof(b));

Spoken: Assign b as greater than or equal to zero, or throw new ArgumentOutOfRangeException, to this.b

or

Spoken: Assign b, when b matches the pattern "greater than or equal to zero" (or throw new ArgumentOutOfRangeException), to this.b.

(TBH, I don't even know how to read the latter.)


I see the as operator as something that works with types.

obj as Animal

Treat object as an Animal if it's convertible. Not:

obj as >= 0

Try to match object against a pattern.

IS4Code commented 1 month ago

@glen-84

obj as >= 0

"Treat obj as something non-negative." There is nothing that says "null on failure" in that sentence, but you don't get that with types either.

viceroypenguin commented 1 month ago

I would also add that regardless of whether as is natural to say in a sentence or not, the relationship between is and as has been established for many versions of C# already. Before pattern matching, there was already a natural equivalence between checking if something is a type and treating something as a type.

I think that equivalence is important for adoption of a new feature, so since we have expanded the check to see if something is a pattern, we should equivalently expand the ability to treat something as as pattern.

I will also point out that using as means that existing code for type-conversion continues to work unchanged. If we extend pattern matching to assignment using when, then there will be two ways to convert a value to a new type: value as NewType and value when NewType. I think this would create confusion and make people question the value of this new feature.

AFatNiBBa commented 1 month ago

What about allowing both

VALUE as PATTERN // else null

And

VALUE as PATTERN else SOMETHING_ELSE

(Although it might be harder to parse)

viceroypenguin commented 1 month ago

@AFatNiBBa Not really necessary, I don't think, because VALUE as PATTERN is intended to return null, so null coalescing (aka ?? SOMETHING_ELSE) already exists. The else would be an unnecessary duplication of existing behavior.

IS4Code commented 1 month ago

Unless null is in the accepted set of values, in which case ?? would filter it out. Something like str as null or "" alone should probably be a warning in itself since null could be removed from that pattern without affecting the result (something that could be checked at compile-time at least).

Such a situation could not happen with as T, so else is actually viable with the extended as (but I suspect not necessary in most situations).

AFatNiBBa commented 1 month ago

@viceroypenguin Exactly as @IS4Code stated, I often find myself considering null a "valid" value (At least for the context in which the match is made)

333fred commented 1 week ago

LDM discussed this here. Ultimately, we do not want to proceed with this proposal, and are closing it as rejected.