dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.91k stars 4.01k forks source link

C# Design Notes - catch up edition, Feb 29, 2016 (deconstruction and immutable object creation) #9330

Closed MadsTorgersen closed 8 years ago

MadsTorgersen commented 8 years ago

C# Language Design Notes Feb 29, 2016

Catch up edition (deconstruction and immutable object creation)

Over the past couple of months various design activities took place that weren't documented in design notes. The following is a summary of the state of design regarding positional deconstruction, with-expressions and object initializers for immutable types.

Philosophy

We agree on the following design tenets:

Positional deconstruction, with-expressions and object initializers are separable features, enabled by the presence of certain API patterns on types that can be expressed manually, as well as generated by other language features such as records.

API Patterns

API patterns for a language feature facilitate two things:

It turns out the biggest design challenges are around the second part. Specifically, all these API patterns turn out to need to bridge between positional and name-based expressions of the members of types. How each API pattern does that is a central question of its design.

Assume the following running example:

public class Person
{
  public string FirstName { get; }
  public string LastName { get; }

  public Person(string firstName, string lastName)
  {
    FirstName = firstName;
    LastName = lastName;
  }
}

In the following we'll consider extending and changing this type to expose various API patterns as we examine the individual language features.

Here's an example of using the three language features:

var p = new Person { FirstName = "Mickey", LastName = "Mouse" }; // object initializer
if (p is Person("Mickey", *)) // positional deconstruction
{
  return p with { FirstName = "Minney" }; // with-expression
}

Semantically this corresponds to something like this:

var p = new Person("Mickey", "Mouse"); // constructor call
if (p.FirstName == "Mickey") // property access
{
  return new Person("Minney", p.LastName); // constructor call
}

Notice how the new features that use property names correspond to API calls using positional parameters, whereas the feature that uses positions corresponds to member access by name!

Object initializers for immutable objects

(See e.g. #229)

This feature allows an object initializer for which assignable properties are not found, to fall back to a constructor call taking the properties' new values as arguments.

new Person { FirstName = "Mickey", LastName = "Mouse" }

becomes

new Person("Mickey", "Mouse")

The question then is: how does the compiler decide to pass the given FirstName as the first argument? Somehow it needs clues from the Person type as to which properties correspond to which constructor parameters. These clues cannot just be the constructor body: we need this to work across assemblies, so the clues must be evident from metadata.

Here are some options:

1: The type or constructor explicitly includes metadata for this purpose, e.g. in the form of attributes. 2: The names of the constructor parameters must match exactly the names of the corresponding properties.

The former is unattractive because it requires the type's author to write those attributes. It requires the type to be explicitly edited for the purpose.

The latter is better in that it doesn't require extra API elements. However, API design guidelines stipulate that public properties start with uppercase, and parameters start with lower case. This pattern would break that, and for the same reason is highly unlikely to apply to any existing types.

This leads us to:

3: The names of the constructor parameters must match the names of the corresponding properties, modulo case!

This would allow a large number of existing types to just work (including the example above), but at the cost of introducing case insensitivity to this part of the C# language.

With-expressions

(see e.g. #5172)

With-expressions are similar to object initializers, except that they provide a source object from which to copy all the properties that aren't specified. Thus it seems reasonable to use a similar strategy for compilation; to call a constructor, this time filling in missing properties by accessing those on the source object.

Thus the same strategies as above would apply to establish the connection between properties and constructor parameters.

p with { FirstName = "Minney" }

becomes

new Person("Minney", p.LastName)

However, there's a hitch: if the runtime source object is actually of a derived type with more properties than are known from its static type, it would typically be expected that those are copied over too. In that case, the static type is also likely to be abstract (most base types are), so it wouldn't offer a callable constructor.

For this situation there needs to be a way that an abstract base class can offer "with-ability" that correctly copies over members of derived types. The best way we can think of is to offer a virtual With method, as follows:

public abstract class Person
{
  ...
  public abstract Person With(string firstName, string lastName);
}

In the presence of such a With method we would generate a with expression to call that instead of the constructor:

p.With("Minney", p.LastName)

We can decide whether to make with-expressions require a With method, or fall back to constructor calls in its absence.

If we require a With method, that makes for less interoperability with existing types. However, it gives us new opportunities for how to provide the position/name mapping metadata thorugh the declaration of that With method: For instance, we could introduce a new kind of default parameter that explicitly wires the parameter to a property:

  public abstract Person With(string firstName = this.FirstName, string lastName = this.LastName);

To explicitly facilitate interop with an existing type, a mandatory With method could be allowed to be provided as an extension method. It is unclear how that would work with the default parameter approach, though.

Positional deconstruction

(see e.g. #206)

This feature allows a positional syntax for extracting the property values from an object, for instance in the context of pattern matching, but potentially also elsewhere.

Ideally, a positional deconstruction would simply generate an access of each member whose value is obtained:

p is Person("Mickey", *)

becomes

p.FirstName == "Mickey"

Again, this requires the compiler's understanding of how positions correspond to property names. Again, the same strategies as for object initializers are possible. See e.g. #8415.

Additionally, just as in with-expressions, one might wish to override the default behavior, or provide it if names don't match. Again, an explicit method could be used:

public abstract class Person
{
  ...
  public void Person GetValues(out string firstName, out string lastName);
}

There are several options as to the shape of such a method. Instead of out-parameters, it might return a tuple. This has pros and cons: there could be only one tuple-returning GetValues method, because there would be no parameters to distinguish signatures. This may be a good or a bad thing.

Just as the With method, we can decide whether deconstruction should require a GetValues method, or should fall back to metadata or to name matching against the constructor's parameter names.

If the GetValues method is used, the compiler doesn't need to resolve between positions and properties: the deconstruction as well as the method are already positional. We'd generate the code as follows:

string __p1;
string __p2;
p.GetValues(out __p1, out __p2);
...
__p1 == "Mickey"

Somewhat less elegant for sure, and possibly less efficient, since the LastName is obtained for no reason. However, this is compiler generated code that no one has to look at, and it can probably be optimized, so this may not be a big issue.

Summary

For each of these three features we are grappling with the position-to-property match. Our options:

  1. Require specific metadata
  2. Match property and parameter names, possibly in a case sensitive way
  3. For deconstruction and with-expressions, allow or require specific methods (GetValues and With respectively) to implement their behavior, and possibly have special syntax in With methods to provide the name-to-position matching.

We continue to work on this.

FunctionalFirst commented 8 years ago

@mythz

The complex match expression looks fragile (and IMO not particularly readable) where I'd expect it would be hard to workout why an arbitrary data graph doesn't match a complex nested match expression. I'd also not expect debugging or tooling/intelli-sense to be able to provide as much assistance as it could with multiple statements.

F# has had positional deconstruction for more than a decade (since v1.0) and I've never once heard these complaints. It's seen, almost universally, as a win in productivity and expressiveness.

MgSam commented 8 years ago

@CyrusNajmabadi I take your point about positional construction vs deconstruction, but let's not forget, positional construction as we know it now has existed in programming languages for 40 years, and when it was designed the size of your code was a very real concern, so named arguments were infeasible. Not to mention IDE tooling was non-existent. C# 1.0 - 3.0 inherited this legacy; it's not like you could have made positional arguments illegal once named arguments were added in v4.

However, positional matching can be very valuable in certain pieces of code as determined by the author of that code. If this is my code, and i know what the shape of a node is, then making me specify all the names makes the code very heavyweight compared to the version where i eschew it.

In a modern programming environment there is no reason to ever not be using named arguments because the tooling can/should make it incredibly easy for you do to do so. There's no need to type the names of the arguments; let a tool fill them in for you.

When you design features to save the programmer a few keystrokes but are to the detriment of later readers of the code you end up with a write-only language like Perl. You may have memorized what each argument means in new Node(Red, someVal, oneNode, otherNode) but you shouldn't assume that your co-worker (or future you!) will have it memorized.

CyrusNajmabadi commented 8 years ago

@mgsam > I take your point about positional construction vs deconstruction, but let's not forget, positional construction as we know it now has existed in programming languages for 40 years

So has positional deconstruction :)

MgSam commented 8 years ago

@CyrusNajmabadi Yes but it's not a breaking change to leave it out of C# 7.0.

CyrusNajmabadi commented 8 years ago

In a modern programming environment there is no reason to ever not be using named arguments

And this is where we disagree. I don't think it's my place to prescribe to people that there is no reason to ever not be using named arguments. I think it's totally up to the consumer to make the decision. As you can see from Roslyn itself, the vast majority of our code does not use named arguments. Indeed, we use it sparingly in the places where we think it adds readability. :)

I think the language should provide the tools, and we should allow developers to use these tools as best as possible for their own domains.

When you design features to save the programmer a few keystrokes but are to the detriment of later readers of the code

"detriment" is very ambiguous :)

I can't think of a single feature we've ever added that someone couldn't argue was detrimental for later readability :)

mythz commented 8 years ago

Really? Why?

Because it would force our brains to match the nested data graph with the complex nested match expression in a single pass. I believe source code should optimized for read/skim-ability where the less times we need to use our head as a calculator and memory bank, the better - there are much better places we could be spending our limited complexity budget on. To give more context on this, I believe programming languages and APIs should be optimized for learnability.

That approach is nearly 3x longer than the simple positional form.

Saying something is longer doesn't say anything about its readability (e.g. 1 char vars used in positional example) as what's missing could easily be context that your head needs to invisibly carry as it's trying to parse the expression. What you should be measuring is the amount unnecessary artificial machinery required, which I'm not seeing much of.

I'm a big believer in giving the language tools to enable both.

So is C++ whereas more enjoyable languages like Python prefer one way of doing things.

CyrusNajmabadi commented 8 years ago

@MgSam That doesn't really contradict what i saying :)

Positional construction has existed for decades. So has positional deconstruction.

C# had positional construction no one seemed to mind. And we even added named construction later (ostensibly for COM, but i actually think it's useful all the time).

"positional" is extremely natural in C#. It's everywhere. Look at 99.9% of all method calls out there. Positional. Look at 99.9% of object construction: Positional. Indexing: Positional. Attributes: Often positional (though often named as well).

I personally would find it extremely unnatural to say: you can construct an object positionally, but you can't deconstruct it positionally. Look back at my actual example. We allow, and primarily see people writing: new Node(Black, val, newLeft, newRight). But to deconstruct, i'd now be forced to name all the values? That seems very inconsistent and onerous. In my own library i'd have to write "Color/Val/Left/Right" over and over and over again. Indeed, in the example i gave where i was doing just one of at least a dozen pattern matches, i needed to write those names 12 times. For my domain that was 12 things i had to write during deconstruction that i didn't have to write during construction. Instead of deconstruction being as natural and easy as construction, it becomes decidedly harder and less pleasant for my domain.

Again, not an exaggeration. An actual use case.

HaloFour commented 8 years ago

@FunctionalFirst

F# has had positional deconstruction for more than a decade (since v1.0) and I've never once heard these complaints. It's seen, almost universally, as a win in productivity and expressiveness.

Notably the identifier pattern is only really applicable to discriminated unions and active patterns which have a positional definition to them. Arguably the record syntax proposed for C# has more in common with an F# active pattern with a single partition so position matching/deconstruction makes sense there.

CyrusNajmabadi commented 8 years ago

You may have memorized what each argument means in new Node(Red, someVal, oneNode, otherNode) but you shouldn't assume that your co-worker (or future you!) will have it memorized.

This is literally the world today for C# en masse :)

The language has survived great to this point despite having positional parameters. Many other languages which only have positional parameters have worked out fine. Many modern recent languages have come out, been quite popular, and still only have positional forms. :)

Positional does not actually seem to be a great detriment to readability or maintainability. And, indeed, the C# development community has not appeared at all to have sided with the idea that they should avoid it and only stick with named parameters. As such, i'm extremely hesitant to use that as the reasoning for why we should avoid positional deconstruction.

MgSam commented 8 years ago

@CyrusNajmabadi

I don't think it's my place to prescribe to people that there is no reason to ever not be using named arguments.

C# makes many choices to restrict what the developer can do. "Pit of quality" and all that. If a developer wants a free-for-all language where every feature under the sun has been added and you can dig yourself into an inescapable hole at the drop of a hat they can use C++.

As you can see from Roslyn itself, the vast majority of our code does not use named arguments.

I also don't think that because MS's style internal style guidelines have allowed or disallowed something makes it the correct decision.

I know nearly every time I write a method call where I've elided named arguments, the next time I look at that code again I've already forgotten what each parameter is. The problem is even worse when all the arguments are of the same type. Take Regex.Replace for example. Can you tell me the order of the 3 string arguments without looking at the docs? I use it almost everyday and I still forget.

But I digress. The positional vs named arguments has already sailed.

I just don't think positional deconstruction adds anything besides confusion. You again mentioned that you'd have extra characters in there- so what? A tool should be generating those characters for you. A tool should be formatting it onto separate lines for you so that its easily readable. The language shouldn't need a dubious feature because the tooling is not robust enough.

CyrusNajmabadi commented 8 years ago

I also don't think that because MS's style internal style guidelines have allowed or disallowed something makes it the correct decision.

It's the decision of the greater community. The coding style you are espousing has not been adopted by the community at large. You seem to be arguing: I want C# to be used in this manner. And because of that, i don't like that you're adding this feature.

I'm arguing: the C# community has shown exactly how they use this language. As such, this feature makes sense given how people actually use the language.

I just don't think positional deconstruction adds anything besides confusion.

The same argument could be made about positional construction. Or positional invocation. But in practice it does not seem to actually be an issue. And this is the case across a huge number of languages.

I'm trying to figure out why positional is fine for all these other cases but suddenly doesn't "add anything but confusion" to the deconstruction case. Especially as we have numerous languages that use positional deconstruction as well.

You again mentioned that you'd have extra characters in there- so what?

I believe extra characters have to justify their worth. Take, for example, anonymous methods versus lambdas. We coudl have never shipped lambdas, and we could have instead forced people to write things like this:

.Select(delegate (x) { return x.Age > 21; })

We did argue about these "extra characters" and we did have arguments about "so what". And, the result, at the end of the day was that the was enough value in not forcing people to add all that ceremony to do something we thought was very useful. To me, positional fits into the same bucket. I need a lot more ceremony which i would prefer to not have in my example when i deconstruct objects. Just as named parameters would be a lot more ceremony when i construct objects.

I wholeheartedly approve of having the named forms so that people can have the names if they find them useful. I'm just loath to force people to have that ceremony if they don't feel like it adds anything to their code.

At the end of the day, i think the choice should be in hte hands of hte developer. For you and your team you can absolutely state that you will never use the positional form (just as i assume you do the same for all other positional constructs). That's totally fine! :) But for developers that find the ceremony too excessive, they can always choose the other form. :)

bbarry commented 8 years ago

I think these are comparable samples:

//deconstruction/positional syntax
if (tree is Node(Black, var x, var a, Node(Red, var y, var b, Node(Red, var z, var c, var d))))) {
    return new Node(Red, y, new Node(Black, x, a, b), new Node (Black, z, c, d));
}

//property pattern syntax
if (tree is Node {
    Color is Black,
    Val is var x,
    Left is var a,
    Right is Node {
        Color is Red,
        Val is var y,
        Left is var b,
        Right is Node {
            Color is Red,
            Val is var z,
            Left is var c,
            Right is var d
        }
      }
  }) {
    return new Node(Red, y, new Node(Black, x, a, b), new Node (Black, z, c, d));
}

//type pattern syntax
if (tree is Node top && top.Color == Black
    && top.Right is Node right && right.Color == Red
    && right.Right is Node rightChild && rightChild.Color == Red) {
    let x = top.Val;
    let a = top.Left;
    let y = right.Val;
    let b = right.Left;
    let z = rightChild.Val;
    let c = rightChild.Left;
    let d = rightChild.Right;
    return new Node(Red, y, new Node(Black, x, a, b), new Node (Black, z, c, d));
}

//C#6
var top = tree as Node;
var right = top?.Right as Node;
var rightChild = right?.Right as Node;
if (top?.Color == Black && right?.Color == Red && rightChild?.Color == Red) {
    var x = top.Val;
    var a = top.Left;
    var y = right.Val;
    var b = right.Left;
    var z = rightChild.Val;
    var c = rightChild.Left;
    var d = rightChild.Right;
    return new Node(Red, y, new Node(Black, x, a, b), new Node (Black, z, c, d));
}

To me, type pattern syntax seems strictly superior (with the C#6 syntax right behind it). Certainly deconstruction syntax is most terse here, but I fail to see why that deserves to be a goal.

CyrusNajmabadi commented 8 years ago

but I fail to see why that deserves to be a goal.

It's not a goal. But it's a benefit.

To me, the goal is to give users flexible approaches to use patterns in their own code with reasonably knobs for their own domain wrt expressiveness/terseness/etc. Positional patterns happen to be one piece of that goal

Note: We went through a similar thing (with similar debates) in the linq timeframe for query expressions (i.e. from x in ... where ... select ...). Technically you don't need them. Indeed, as hte language spec shows, they're nothing more than sugar over writing the standard method calls yourself. But we felt there was enough value in bringing this more succinct way for working with your data sources vs just the method+delegate argument approach.

It was never a "goal" to get this specific feature in. But it still was felt to be a benefit to get it for all the c# developers we thought woudl be using linq.

To me, the value of positional deconstruction is definitely there. Even the next closest example in what you listed (the other pattern matching form) is literally twice as long. It's twice as long, and for my domain adds nothing but ceremony. I know in my domain that it's Color/Value/Left/Right. Indeed, that's why i can write:

new Node(Red, y, new Node(Black, x, a, b), new Node (Black, z, c, d)) instead of the much more unwieldy: new Node(color: Red, value: y, left: new Node(color: Black, value: x, left: a, right: b), right: new Node (color: Black, value: z, left: c, right: d))

In some places in code i might find the latter beneficial. But i'dnever require anyone to write it. And i'm loath to do the same with deconstruction. deconstruction would no longer be symmetric with construction, and it would not face a large 'ceremony' tax every time you wanted to use it.

maloo commented 8 years ago

@CyrusNajmabadi

I'm trying to figure out why positional is fine for all these other cases but suddenly doesn't "add anything but confusion" to the deconstruction case. Especially as we have numerous languages that use positional deconstruction as well.

Context, readability and the ability to reason about code. When looking at a piece of code you want to understand what it does quickly. Construction is easy, new Foo(a, b, c) will give you a Foo object and a, b, c is not that important to understand the flow of the code. Similar for invocation, you can DoThatThing(x, y, z) and understand that it will "do that thing". The function name give context and clarification, even when used in a condition like if (MyConditionHolds(x, y, z)). You only need to look at the function name when browsing the code. But when you now instead get if (p is Person("Mickey", true, *)) there is no longer any context for "Mickey", true which is the important part of the statement. And yet they are the two most important parts of the expression for understanding the condition. if (p is Person p && p.Name == "Mickey" && p.IsAuthorized) Personally I would always prefer the latter when browsing someone else's code.

CyrusNajmabadi commented 8 years ago

type pattern syntax seems strictly superior

Then feel free to use that syntax :)

None of my arguments state that that syntax should not be offered or available to people who want to use it. If it's the best for you terrific. That's why we have several forms for whatever feels best.

CyrusNajmabadi commented 8 years ago

Context, readability and the ability to reason about code. When looking at a piece of code you want to understand what it does quickly. Construction is easy, new Foo(a, b, c) will give you a Foo object and a, b, c is not that important to understand the flow of the code.

Er... why is it not important? All the same reasons people gave before seem to apply here. What is 'a'? Is it correct at the first argument? How can i tell just from reading the code? What about 'b' and 'c'? As mgsam stated:

"I know nearly every time I write a method call where I've elided named arguments, the next time I look at that code again I've already forgotten what each parameter is."

For him not having parameter names in the code means he forgets nearly every time.

Personally I would always prefer the latter when browsing someone else's code.

And MgSam would personally prefer that everyone uses named paramters for all invocation/construction expressions. :)

(and when we did generics we had people who said they were prefer that they never look at code with generics in it).

As i've mentioned already. I have no problem with people being able to choose for themselves what parts of the language they want to use. But i'm loath to insist that everyone else must follow. Mgsam would prefer no one use anything but named parameters. But i don't think others feel the same way.

You'd prefer to use names your 'Person' case, and i'd have no problem if you did that. But i'd prefer to not have to put in all those names in my code. Giving people these tools allows all the different styles to be satisfied. Keeps things nicely parallel with construction. i.e. if you can understand your code that uses construction without names, then you can understand your code that uses deconstruction without names. Similarly, if you use names wiht construction, then you'd probably want names with deconstruction.

So, for your example, i would have always constructed it as: new Person(..., authorized: true), i would never write new Person(..., true). And my teammates can back me up on this. Whenever there's just some random constant that doesn't not provide enough context, i always ask them for a named parameter. Given that i'd use named parameters on construction, i'd almost certainly use it on deconstruction as well. However, for tons and tons of cases where i didn't bother with names on construction (because it's clear from context), i would have no problem with either approach.

HaloFour commented 8 years ago

It's interesting how much of this seems to be arguing against pattern matching in general and not of positional deconstruction. I wonder if that's due to the unfamiliar syntax or just the limited examples that have been demonstrated which could almost as easily be handled by type checks and null-propagation. I think some of the blame also falls on the limited subset of patterns that have been proposed thus far. No array/list/dictionary patterns, no and/or patterns, no parenthesis/associativity patterns and no active patterns. It's arguing for a half-empty toolbox when you already have a hammer.

My only concern with positional deconstruction is how the properties are related to their positions. I don't like the idea of voodoo applying here. I think that a developer, either the author of the type or the author of an extension method, should actively define the relationship through declaring a method which returns the deconstructed form. I'm (of course) partial to the operator is syntax I proposed in #9005. I don't think that there needs to be a second GetValues form if we plan on having is operators.

MgSam commented 8 years ago

@CyrusNajmabadi

It's hte decision of the greater community. The coding style you are espousing has not been adopted by the community at large. You seem to be arguing: I want C# to be used in this manner. And because of that, i don't like that you're adding this feature.

If a feature exists you can be guaranteed people are going to use it. Positional arguments already exist. Let's face the facts. Most developers don't spend a whole lot of time thinking about writing easy to read, provable correct code. That's why you guys have to design C# as a "pit of quality"; to save developers from their own human fallibility. Just because a feature is widely used doesn't mean it would have been the right choice if you were designing the language from scratch today; given experience and 20-20 hindsight.

Look at object initialization syntax. That has no positional form. But why not? The properties are written in some order when you type them into the file, so why not allow objects to be initialized with that ordering? Or maybe alphabetical, like they're displayed in Autocomplete? It would certainly save keystrokes. No one even suggests such an idea though, because it would be a nightmare for readability.

I believe extra characters have to justify their worth. Take, for example, anonymous methods versus lambdas. We coudl have never shipped lambdas, and we could have instead forced people to write things like this:

But this is not a good comparison, because lambda syntax, though terse, did not introduce any ambiguity. Positional syntax is entirely ambiguous; it requires documentation (either in-tool or external), it is brittle, and it is hard to read. There's no inherent reason it is Regex.Replace(input, pattern, replacement) rather than Regex.Replace(pattern, input, replacement).

CyrusNajmabadi commented 8 years ago

I think some of the blame also falls on the limited subset of patterns that have been proposed thus far. No array/list/dictionary patterns, no and/or patterns, no parenthesis/associativity patterns and no active patterns. It's arguing for a half-empty toolbox when you already have a hammer.

These are just LDM notes. They're just exposing the current stuff we're discussing. Nothing is finalized. It is not intended to completely cover all cases. We find that it's much easier to do language design if we bite things off work on it a while, and move forward. Trying to solve the entire space all at once is usually not a great way to do things.

My only concern with positional deconstruction is how the properties are related to their positions. I don't like the idea of voodoo applying here.

There are many things we are considering around positional deconstruction. One is some way of having the compiler do the hookup (and i've been tasked with coming up with some of the proposals there). The other is to have things explicitly provided by the API author. We also strongly think that if the API author provides things explicitly, then the compiler never does any sort of hookup on its own. But none of this has been decided on. :)

CyrusNajmabadi commented 8 years ago

Positional syntax is entirely ambiguous;

And yet, pretty universal :)

Indeed, just above you, you have another people claiming that it's totally clear to do "new Foo(a, b, c)".

I get that it contains less information than the named form. I do not question or argue against that. My point is simple that programming languages have nearly all universally been able to work with that limitation without serious problem across time and across pretty much the entire development community.

I also find your argument somewhat confusing. You previously said: "A tool should be ...".

Given that you are heavily pushing the idea that people use tools, then the issue is very non-problematic. The tooling can always tell you what properties these positional pieces match against. As such, there is no ambiguity.

Now, if you're arguing about how readable the code is in the absence of tooling, then that's another story altogether. But, if you're arguing that, then we have lots of features that suffer from this today, including (but not limited to):

  1. Invocation. Without tooling, you don't know what parameter each argument maps to.
  2. Construction. Same as above.
  3. var. No way to know the type.
  4. simple lambdas. No way to know the parameter types.
  5. linq queries.
  6. No way to know if an operator is overloaded.
  7. No way to know what the following are: N.N.C.C.P.F. Which of those are namespaces, classes, properties or fields? Without tooling, it's really hard to tell at teh language level.
  8. etc. etc.

In the absence of tooling, the language already contains a large swath of useful, widely adopted functionality that can be hard to read. As such, i don't see a problem with adding something else. And, in the presence of tooling, there is no issue because tooling can tell you everything you need to know. Just as sig-help tells you what parameter your argument matches, we could have similar tooling that tells you what property your pattern matches against.

C# has since around V3 moved far away from the "code must be extremely explicit and unambiguous" without tooling. (i know, i had to do the IDE work to make sure the experience was good around all these features ;-) ). That ship has definitely sailed.

HaloFour commented 8 years ago

@CyrusNajmabadi

These are just LDM notes. They're just exposing the current stuff we're discussing. Nothing is finalized. It is not intended to completely cover all cases. We find that it's much easier to do language design if we bite things off work on it a while, and move forward. Trying to solve the entire space all at once is usually not a great way to do things.

Of course, and the spec does drop hints at some of those missing patterns. Although the conversation around and/or patterns hasn't been particularly positive (which is a shame, I honestly think those two would be a massive benefit). But if people aren't that familiar with pattern matching in other languages (and honestly, that includes myself) then not seeing how it applies in more complex cases might not demonstrate their value over imperative C#.

We also strongly think that if the API author provides things explicitly, then the compiler never does any sort of hookup on its own. But none of this has been decided on. :)

Indeed, but I'm in the camp that if the API author doesn't provide things explicitly them the compiler never should. On top of that I think other developers should be able to fill in those gaps. There is precedent in LINQ and in async/await of the compiler allowing extension methods to be resolved. You can make literally anything "awaitable", why not the same for "deconstructable"? And it's a bonus that it's similar to single partition active patterns.

CyrusNajmabadi commented 8 years ago

Indeed, but I'm in the camp that if the API author doesn't provide things explicitly them the compiler never should.

That's completely fair, and certainly is something we're considering.

On top of that I think other developers should be able to fill in those gaps. ... You can make literally anything "awaitable", why not the same for "deconstructable"?

So do we. We're definitely considering ideas where this is possible.

That said, the open discussion is how things work for the literally millions of lines of existing code that exists out there. One option is to say "you can't use patterns with it unless the API author opts in, or you go and add those extensions to light it up". Another option is to say "patterns can light up on existing code without people ahving to do anything."

It is, of course, the age old "is this opt-in or opt-out" discussion. There are good arguments for either side, but there's also questions of pragmatism, value, and whatnot.

Note that we have precedence for lighting stuff up on code that never opted into anything. For example, collection initializers just light up on code that happens to derive from IEnumerable and has an "Add' method. The idea there was that tehre was enough value in just detecting and supporting this pattern (no pun intended) vs forcing all existing 'collection like' types to have to opt-into our collection initializer feature.

In that case, we felt there was just too much value in being able to light this up without needing everyone to have to add "Add" extension methods in their codebases. Personally, i think we made the right choice there. I think it was pragmatic and meant maximal benefit without little downside.

HaloFour commented 8 years ago

@CyrusNajmabadi

Note that we have precedence for lighting stuff up on code that never opted into anything.

I don't disagree, but automatically "lighting up" positional deconstruction via case-modulo constructor/property correlation seems quite flimsy and assumes that there may be relationships between them at all. It's quite common for the shape of the constructor and the shape of the properties to be wholly or partially unrelated. It's also common for the constructor arguments to be transformed prior to being exposed through properties, but you're comparing the property values in what otherwise feels like constructor syntax. At what point does the compiler say, "Nope, these constructors don't match the properties enough, so we can't positionally deconstruct this type." And I worry about the false positives a lot more than I worry about the false negatives.

Anyway, these are all known arguments against that aspect of the feature.

KathleenDollard commented 8 years ago

“It's also common for the constructor arguments to be transformed prior to being exposed through properties”

Doesn’t that present a problem for both named and positional deconstruction?

CyrusNajmabadi commented 8 years ago

Anyway, these are all known arguments against that aspect of the feature.

Yup!

We'll definitely only want hookup if we think the amount of true positives is sufficient and hte amount of false positives is acceptable low. i.e. the same approach we took for collection initializers.

Personally, i would take a somewhat conservative approach. I.e. the match should be very easy to explain. As a starting point it might just be something like: the matching property much exist and must have the exact same type as the parameter. This could eliminate some cases that we might like, but could still keep enough value with existing codebases out there.

For those types that didn't fit this conservative approach, the API author would either have to opt in, or we'd need some other mechanism (like extension methods) for third parties to light things up there.

HaloFour commented 8 years ago

@KathleenDollard

I don't believe so, property patterns use the property names directly so it should be more clear that the pattern is applied to the property values.

CyrusNajmabadi commented 8 years ago

@KathleenDollard

Doesn’t that present a problem for both named and positional deconstruction?

Not for named deconstruction. With named deconstruction, you are stating which properties you care about.

For positional it may present a problem depending on how you implement positional deconstruction. For positional deconstruction that only involves some sort of "automatic/implicit" matching, then it would be problematic. However, we're of the position that we'd always provide some way to do positional with explicitly defined matching. If you have such a system, and are using it with the API, then there would be no problem and things would work regardless of how the actual class transformed data from teh constructor.

In other words, you can always provide some sort of explicit mechnism to support positional deconstruction. In that case, there is never any "problem" when using positional deconstruction. However, in the absence of the code providing that mechanism, we are consider having a way of providing some sort of automatic system that would allow patterns to still light up. That mechanism has not been spec'ed/defined/etc. But the feedback here matches our own internal thoughts both around:

  1. is such a feature desirable?
  2. if we included such a feature, what's the right way to do it?
gafter commented 8 years ago

@HaloFour

My only concern with positional deconstruction is how the properties are related to their positions. I don't like the idea of voodoo applying here. I think that a developer, either the author of the type or the author of an extension method, should actively define the relationship through declaring a method which returns the deconstructed form.

Your concern is the crux of an issue we're struggling with right now. I happen to agree with you, but the issue isn't settled.

I'm (of course) partial to the operator is syntax I proposed in #9005. I don't think that there needs to be a second GetValues form if we plan on having is operators.

We are considering GetValues as an alternative to the currently specified operator is, in part because one can provide it as an extension method without adding too much machinery to the language.

HaloFour commented 8 years ago

@gafter

We are considering GetValues as an alternative to the currently specified operator is, in part because one can provide it as an extension method without adding too much machinery to the language.

While that does make sense it seems a bit redundant to have two very similar bits of syntax, one to deal with just destructuring and the other to deal with matching and destructuring.

gafter commented 8 years ago

@HaloFour Active patterns would be driven by a static GetValues method that has a bool return type and a non-out first parameter. "Ordinary" patterns would be an instance void GetValues method. Both are independently valuable because they have different static semantics for computing whether or not a set of patterns is complete (the former can return false, but the latter cannot). Also, the non-active pattern can be done using an extension method, which is important for retrofitting existing types.

In any case, the purpose of destructuring is to support pattern-matching. The language won't use it for anything else.

axel-habermaier commented 8 years ago

What I don't get is why all of a sudden all types should be compatible with positional pattern matching automatically even though they were never designed for that. @CyrusNajmabadi's tree example, in particular, does not explain that at all: The tree type would probably be defined as a record anyway, in which case positional destruction of course make sense (as long as you're used to pattern matching, that is). For all other types, just use property match expressions; if you don't like that, define an extension method (active pattern) that does the positional match for you. To me, that seems to be the far more reasonable approach than the compiler magic discussed here. This is also similar to how it works in F#. Edit: As @isaacabraham correctly notes, F# only supports named matches for record types. If not even functional languages such as F# support arbitrary positional matching, why should C#? Then again, both positional and named matches are available for discriminated unions.

@mythz: Have you used pattern matching in some language before? Your objection to positional pattern matching does not seem to be about the positional aspect at all; it is rather a general objection against pattern matching. The arguments you gave against pattern matching are, from my point of view, the advantages of pattern matching over the imperative code you've shown...

isaacabraham commented 8 years ago

Positional matching in F# (if I'm thinking about the same thing everyone else is) only works for tuples. For records i.e. data structures with named fields, you must match by name. So positional matching for tuples looks like this: -

let foo = function
| _, "bloggs", _ -> "loves C#!"
| "isaac", _, _ -> "loves F# and C#!"
| _ -> "loves F#!"

foo ("isaac", "abraham", 36) // "loves F# and C#"

With records it's somewhat different: -

type Person = { FirstName : string; LastName : string }
let foo = function
| { LastName = "bloggs" } -> "loves C#!"
| { FirstName = "isaac"; Age = 36 } -> "loves F# and C#!"
| _ -> "loves F#!"

foo { FirstName = "isaac"; LastName = "abraham"; Age = 36 } // "loves F# and C#"

I personally can't imagine how it would look to have positional matching over a named structure - it's too fragile e.g. add another field and your existing matching clauses either break or (even worse) continue to work but against the wrong field? Also - tuples tend to only have a few fields so positional matching isn't to cumbersome. Records often have many fields - you wouldn't want to positionally match against e.g. field 10 and have to leave the others blank I think. It's more readable to simply explicitly mark the fields you're matching against.

Just my two pennies.

JiriZidek commented 8 years ago

add another field and your existing matching clauses either break or (even worse) continue to work but against the wrong field

This would be a nightmare - someone just reorders props in class in some 3rd party library, and your code compiles, but does not work... Especially in Pocos, where a bunch of "string" props is quite offten - this would happen on regular basis. After reading the discussion I'm getting a feeling that anything "automatic" that does not happen locally (within current block or current method) would be evil. Everyone knows that reordering params in method is dangerous breaking change, when producing libs, that other (external) people use. This would open another door to such problems.

markrockcity commented 8 years ago

Why not just stuff the properties into the constructor's parameter list? There was a similar proposal during the C# 6.0 design, but for fields (https://roslyn.codeplex.com/discussions/540281).

public class Person
{
    public Person(public string FirstName, public string LastName)
    {
    }
}

The actual, generated parameter names would be normalized so that the first letter is lower-cased, and custom attributes could automatically added, too:

public class Person
{
      public string FirstName { get; }
      public string LastName { get; }

      public Person([Property("FirstName")] string firstName, [Property("LastName")] string lastName)
      {
           FirstName = firstName;
           LastName = lastName;
      }
}
isaacabraham commented 8 years ago

@JiriZidek That's why in F# it's common to avoid using tuples with "simple" types like strings or ints, particularly when over public contract or as arguments / outputs of a function e.g. imagine the following situation where we have a function returning an (int * int) tuple: -

let data = getCustomerIdAndOrderId()
match data with
| 29, _ -> "Customer 29!"
| customerId, 520 -> sprintf "Order 520 was placed by customer %d" customerId
| data -> printfn "%A" data

now imagine the implementer of getCustomerIdAndOrderId() flips the two around: getOrderIdAndCustomerId(). Whoops - now your whole program is kaput, except the compiler can't tell us this.

Better to do something like this: -

// Simple wrapper types over int
type CustomerId = CustomerId of int 
type OrderId = OrderId of int

let data = getCustomerIdAndOrderId()
match data with
| CustomerId 29, _ -> "Customer 29!"
| CustomerId customerId, OrderId 520 -> sprintf "Order 520 was placed by customer %d" customerId
| data -> printfn "%A" data

Now getCustomerIdAndOrderId returns CustomerId * OrderId. So the type system protects us from accidentally flipping the parameters around, which you don't get with a string * string tuple, where as far as the type system is concerned they are interchangeable.

JiriZidek commented 8 years ago

Yes, explicit type aliases - I loved them in TurboPascal & love them now in TypeScript...

type CustomerId = CustomerId of int type OrderId = OrderId of int

Missing them in C# :-) - a bit off-topic.

isaacabraham commented 8 years ago

@JiriZidek it's not type aliases - it's a completely different type. Type Aliases don't give you any protection - you can still mix CustomerId and OrderId as aliases. Single case DUs (wrapper types from above) are proper nominal types with a constructor for the fields etc..

DavidArno commented 8 years ago

@MadsTorgersen

@DavidArno Nothing is set in stone. Our current plan of record is that pattern matching will be integrating into existing language constructs (is-expressions and switch statements) as well as a new expression form which we call a match expression.

As we experiment with things, those plans may change. We may choose to add patterns in more places (for instance, let-statements have been proposed), or we may trickle them in over multiple releases (e.g. adding match expressions later).

To mind my, you have the priorities back to front. Pattern matching is a functional feature and thus is primarily geared toward expressions, not void statements. Thus, implementing match expressions should be the priority and adding it to switch should be the lower priority activity.

I would consider records, tuples, ADTs/discriminated unions and match expressions as a "all or nothing" bundle. Releasing only some of these features in C# 7 and the rest in a later version makes no sense. All other proposals, including match statements. should then be "nice to haves" if there's time.

HaloFour commented 8 years ago

@gafter

Active patterns would be driven by a static GetValues method that has a bool return type and a non-out first parameter. "Ordinary" patterns would be an instance void GetValues method.

While I understand that using ordinary methods to perform this requires less changes to the language I seriously hope you guys do consider an operator. A normal method, particularly one with a generic name like GetValues, will end up cluttering up types and Intellisense in all of the wrong places. There are already several examples of BCL classes with either static or instance GetValues methods.

TonyValenti commented 8 years ago

I was reading over all of these threads and I had a few thoughts I wanted to share:

It seems as though positional deconstruction and pattern matching go hand in hand.

I've used both in academia before. It was fun because it let me express things more concisely when I wrote the code but I absolutely hated it when I was looking at someone else's code because I had to wrap my brain around even more information: the position of the parameters.

When I look at some of the different ideas, here are examples of how I would prefer to be able to express things:

//I like this much better than the wildcard example
if (p is Person p && p.Name == "Mickey" && p.IsAuthorized)

//This gives me a new anonymous type that has a FirstName and Age property
//You can do something kind of like that with Linq today:
//var item = from x in items select new { x.Name, x.Age };
var Info = new { Person.Name, Person.Age};

//Declares FirstName and LastName and assigns them to Person's FirstName and last name.
var {FirstName, LastName} = Person.{FirstName, LastName};

//Alternate syntax that I like more.
var {FirstName, LastName} = {Person.FirstName, Person.LastName};

//Since "With" was mentioned...
var NewPerson = Person with {
    FirstName = "Smith"
};

In all of those examples, I like them over the provided examples because, if I were to look at the code for the first time, I would be able to quickly pick up on what was happening and not have to wrap my head around all the positional stuff.

KathleenDollard commented 8 years ago

I chatted with @bjoeris about this last night. He's a polyglot geek and deep into Haskell. He misunderstood the positional syntax initially, then hated it.

If we want to do that, can we use curlies instead of parens on the positional syntax?

Of course I don't want the positional syntax at all, but the property name used to indicate any comparisons made.

bbarry commented 8 years ago

I think the most straightforward way to do this would be with some sort of Option<T> struct:

public struct Option<T> {
  private T _value;
  public bool HasValue { get; private set; }

  public T Value
  {
    get { return _value; }
    set
    {
      _value = value;
      HasValue = true;
    }
  }
}

And a contextual keyword applicable to method signatures with:

public partial class Person
{
  public with Person With(
    Option<string> FirstName = default(Option<string>),
    Option<string> LastName = default(Option<string>)
    )
  {
    return new Person(
        FirstName.HasValue ?  FirstName.Value : this.FirstName,
        LastName.HasValue ?  LastName.Value : this.LastName
        );
  }
}

Along with several semantic rules:

Of course most of those constraints could be hidden by additional syntax capabilities in the compiler and this method could be an extension method:

//as extension method
public static with Person With(this Person p,
  Option<string> FirstName = default(Option<string>),
  Option<string> LastName = default(Option<string>)
  )
{
  return new Person(
      FirstName.HasValue ?  FirstName.Value : p.FirstName,
      LastName.HasValue ?  LastName.Value : p.LastName
      );
}

//constraints handled by new syntax
public with Person With(FirstName, LastName)
{ ... }

//again as extension method
public static with Person With(this Person p, FirstName, LastName)
{ ... }

The names of the parameters to the method can provide the positions for positional patterns and the method itself can be used for with cloning.

edit:

var p = new Person { FirstName = "Mickey", LastName = "Mouse" };
if (p is Person("Mickey", *)) // positional deconstruction
{
  return p with { FirstName = "Minney" }; // with-expression
}

would be lowered as:

//still not sure how to do the object initializer exactly
var p = new Person(null,null).With(FirstName: new Option<string>("Mickey"), LastName: new Option<string>("Mouse"));
var _1 = p as Person;
if (_1 != null && _1.FirstName == "Mickey")
{
  return p.With(FirstName: new Option<string>("Minney"));
}

edit2: perhaps the With method must be an extension method and must accept null/default(T) as the first parameter to support object initialization?

CyrusNajmabadi commented 8 years ago

What I don't get is why all of a sudden all types should be compatible with positional pattern matching automatically even though they were never designed for that.

This is not what has been proposed. I'm unaware of any proposal that would make is that "all types should be compatible with positional pattern matching".

We are exploring ideas here. Ideas which may make is so that some types are automatically compatible with parts of pattern matching, without having to add any new code anywhere to support this feature. This is similar to what we did with Collection-Initializers where some types just 'lit up' automatically and could not be used with collection initializers "even though they were never designed for that".

CyrusNajmabadi commented 8 years ago

To me, that seems to be the far more reasonable approach than the compiler magic discussed here.

No compiler magic has actually been discussed. No proposal has been brought forward, and there still has not been any actual exploration of this space. it's very hard to talk about things like "reasonable approach[es]" when we don't actually have an actual proposal to discuss :)

It's actually been tasked to me to go out and see what sorts of actual algorithms could potentially be spec'ed out, and what sort of results we might get from them. If these results aren't good, then it's unlikely we would proceed in this direction. If hte results were very good, we might consider taking things, and we'd ideally be able to present a sensible algorithm and demonstrate the value (i.e. large number of true positives, low number of false positives).

I think it's a really bad idea to not even bother looking into seeing what's possible. If we had done that with collection intializers, for example, we would have likely ended up with a far more limited feature that wouldn't have worked out as nicely as it does with all the cases it supports today.

My hope is to be able to get to this early next week.

CyrusNajmabadi commented 8 years ago

it's too fragile e.g. add another field and your existing matching clauses either break or (even worse) continue to work but against the wrong field?

The same thing exists today with any method/constructor. If you change the shape of it, you can easily break things. For example:

public Point(int x, int y)

Change those around, and you've broken every person who constructs a point. Constructors are part of your public API contract. If you change them, then you run the risk of breaking code. That would remain the same if the constructor was used for construction or deconstruction purposes.

alrz commented 8 years ago

@CyrusNajmabadi As @axel-habermaier said, your example could be written as a record declaration for Node and could be deconstructed positionally right away. What this proposal is suggesting causes "non-record types" become positionally deconstructable out of the box. I'd love to see a real world example that clears up this use case where a positional deconstruction would be useful for a "non-record type" and defining extensions to make it possible is too much.

CyrusNajmabadi commented 8 years ago

I seriously hope you guys do consider an operator. A normal method, particularly one with a generic name like GetValues, will end up cluttering up types and Intellisense in all of the wrong places.

Why would an operator not end up cluttering up the types? Why would a method like this clutter up IntelliSense? We own the intellisense experience (literally, i'm one of the primary contributers to it :) ), so we can provide whatever sort of experience we want here in terms of how this is actually presented and made available there.

JiriZidek commented 8 years ago

@CyrusNajmabadi

The same thing exists today with any method/constructor.

Yes, we all know this. And we know it is dangerous. But the things you described above would be much more vulnerable to this problem.

CyrusNajmabadi commented 8 years ago

said, your example could be written by a record declaration for Node

But i don't want to write it as a record. I want to write it as a class :)

and could be deconstructed positionally right away

The argument i've been countering is one that stated that positional deconstruction shouldn't exist at all, in any form. MgSam, for example, does not believe that positional is ever ok. To the point that he extremely dislikes it even for normal construction or invocation.

defining extensions to make it possible is too much.

No one said it would be too much. I'd be fine with a solution that only allowed positional deconstruction through some operator, instance method or extension method based solution. I think you may not actually understand waht i've been arguing for.

CyrusNajmabadi commented 8 years ago

But the things you described above would be much more vulnerable to this problem.

Why?