dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.96k stars 4.03k forks source link

Proposal: Language support for Tuples #347

Closed MadsTorgersen closed 7 years ago

MadsTorgersen commented 9 years ago

There are many scenarios where you'd like to group a set of typed values temporarily, without the grouping itself warranting a "concept" or type name of its own.

Other languages use variations over the notion of tuples for this. Maybe C# should too.

This proposal follows up on #98 and addresses #102 and #307.

Background

The most common situation where values need to be temporarily grouped, a list of arguments to (e.g.) a method, has syntactic support in C#. However, the probably second-most common, a list of results, does not.

While there are many situations where tuple support could be useful, the most prevalent by far is the ability to return multiple values from an operation.

Your options today include:

Out parameters:


public void Tally(IEnumerable<int> values, out int sum, out int count) { ... }

int s, c;
Tally(myValues, out s, out c);
Console.WriteLine($"Sum: {s}, count: {c}");  

This approach cannot be used for async methods, and it is also rather painful to consume, requiring variables to be first declared (and var is not an option), then passed as out parameters in a separate statement, then consumed.

On the bright side, because the results are out parameters, they have names, which help indicate which is which.

System.Tuple:

public Tuple<int, int> Tally(IEnumerable<int> values) { ... }

var t = Tally(myValues);
Console.WriteLine($"Sum: {t.Item1}, count: {t.Item2}");  

This works for async methods (you could return Task<Tuple<int, int>>), and you only need two statements to consume it. On the downside, the consuming code is perfectly obscure - there is nothing to indicate that you are talking about a sum and a count. Finally, there's a cost to allocating the Tuple object.

Declared transport type

public struct TallyResult { public int Sum; public int Count; }
public TallyResult Tally(IEnumerable<int> values) { ... }

var t = Tally(myValues);
Console.WriteLine($"Sum: {t.Sum}, count: {t.Count}");  

This has by far the best consumption experience. It works for async methods, the resulting struct has meaningful field names, and being a struct, it doesn't require heap allocation - it is essentially passed on the stack in the same way that the argument list to a method.

The downside of course is the need to declare the transport type. THe declaration is meaningless overhead in itself, and since it doesn't represent a clear concept, it is hard to give it a meaningful name. You can name it after the operation that returns it (like I did above), but then you cannot reuse it for other operations.

Tuple syntax

If the most common use case is multiple results, it seems reasonable to strive for symmetry with parameter lists and argument lists. If you can squint and see "things going in" and "things coming out" as two sides of the same coin, then that seems to be a good sign that the feature is well integrated into the existing language, and may in fact improve the symmetry instead of (or at least in addition to) adding conceptual weight.

Tuple types

Tuple types would be introduced with syntax very similar to a parameter list:

public (int sum, int count) Tally(IEnumerable<int> values) { ... }

var t = Tally(myValues);
Console.WriteLine($"Sum: {t.sum}, count: {t.count}");  

The syntax (int sum, int count) indicates an anonymous struct type with public fields of the given names and types.

Note that this is different from some notions of tuple, where the members are not given names but only positions. This is a common complaint, though, essentially degrading the consumption scenario to that of System.Tuple above. For full usefulness, tuples members need to have names.

This is fully compatible with async:

public async Task<(int sum, int count)> TallyAsync(IEnumerable<int> values) { ... }

var t = await TallyAsync(myValues);
Console.WriteLine($"Sum: {t.sum}, count: {t.count}");  

Tuple literals

With no further syntax additions to C#, tuple values could be created as

var t = new (int sum, int count) { sum = 0, count = 0 };

Of course that's not very convenient. We should have a syntax for tuple literals, and given the principle above it should closely mirror that of argument lists.

Creating a tuple value of a known target type, should enable leaving out the member names:

public (int sum, int count) Tally(IEnumerable<int> values) 
{
    var s = 0; var c = 0;
    foreach (var value in values) { s += value; c++; }
    return (s, c); // target typed to (int sum, int count)
}

Using named arguments as a syntax analogy it may also be possible to give the names of the tuple fields directly in the literal:

public (int sum, int count) Tally(IEnumerable<int> values) 
{
    var res = (sum: 0, count: 0); // infer tuple type from names and values
    foreach (var value in values) { res.sum += value; res.count++; }
    return res;
}

Which syntax you use would depend on whether the context provides a target type.

Tuple deconstruction

Since the grouping represented by tuples is most often "accidental", the consumer of a tuple is likely not to want to even think of the tuple as a "thing". Instead they want to immediately get at the components of it. Just like you don't first bundle up the arguments to a method into an object and then send the bundle off, you wouldn't want to first receive a bundle of values back from a call and then pick out the pieces.

Languages with tuple features typically use a deconstruction syntax to receive and "split out" a tuple in one fell swoop:

(var sum, var count) = Tally(myValues); // deconstruct result
Console.WriteLine($"Sum: {sum}, count: {count}");  

This way there's no evidence in the code that a tuple ever existed.

Details

That's the general gist of the proposal. Here are a ton of details to think through in the design process.

Struct or class

As mentioned, I propose to make tuple types structs rather than classes, so that no allocation penalty is associated with them. They should be as lightweight as possible.

Arguably, structs can end up being more costly, because assignment copies a bigger value. So if they are assigned a lot more than they are created, then structs would be a bad choice.

In their very motivation, though, tuples are ephemeral. You would use them when the parts are more important than the whole. So the common pattern would be to construct, return and immediately deconstruct them. In this situation structs are clearly preferable.

Structs also have a number of other benefits, which will become obvious in the following.

Mutability

Should tuples be mutable or immutable? The nice thing about them being structs is that the user can choose. If a reference to the tuple is readonly then the tuple is readonly.

Now a local variable cannot be readonly, unless we adopt #115 (which is likely), but that isn't too big of a deal, because locals are only used locally, and so it is easier to stick to an immutable discipline if you so choose.

If tuples are used as fields, then those fields can be readonly if desired.

Value semantics

Structs have built-in value semantics: Equals and GetHashCode are automatically implemented in terms of the struct's fields. This isn't always very efficiently implemented, so we should make sure that the compiler-generated struct does this efficiently where the runtime doesn't.

Tuples as fields

While multiple results may be the most common usage, you can certainly imagine tuples showing up as part of the state of objects. A particular common case might be where generics is involved, and you want to pass a compound of values for one of the type parameters. Think dictionaries with multiple keys and/or multiple values, etc.

Care needs to be taken with mutable structs in the heap: if multiple threads can mutate, tearing can happen.

Conversions

On top of the member-wise conversions implied by target typing, we can certainly allow implicit conversions between tuple types themselves.

Specifically, covariance seems straightforward, because the tuples are value types: As long as each member of the assigned tuple is assignable to the type of the corresponding member of the receiving tuple, things should be good.

You could imagine going a step further, and allowing pointwise conversions between tuples regardless of the member names, as long as the arity and types line up. If you want to "reinterpret" a tuple, why shouldn't you be allowed to? Essentially the view would be that assignment from tuple to tuple is just memberwise assignment by position.

(double sum, long count) weaken = Tally(...); // why not?
(int s, int c) rename = Tally(...) // why not?

Unification across assemblies

One big question is whether tuple types should unify across assemblies. Currently, compiler generated types don't. As a matter of fact, anonymous types are deliberately kept assembly-local by limitations in the language, such as the fact that there's no type syntax for them!

It might seem obvious that there should be unification of tuple types across assemblies - i.e. that (int sum, int count) is the same type when it occurs in assembly A and assembly B. However, given that structs aren't expected to be passed around much, you can certainly imagine them still being useful without that.

Even so, it would probably come as a surprise to developers if there was no interoperability between tuples across assembly boundaries. This may range from having implicit conversions between them, supported by the compiler, to having a true unification supported by the runtime, or implemented with very clever tricks. Such tricks might lead to a less straightforward layout in metadata (such as carrying the tuple member names in separate attributes instead of as actual member names on the generated struct).

This needs further investigation. What would it take to implement tuple unification? Is it worth the price? Are tuples worth doing without it?

Deconstruction and declaration

There's a design issue around whether deconstruction syntax is only for declaring new variables for tuple components, or whether it can be used with existing variables:

(var sum, var count) = Tally(myValues); // deconstruct into fresh variables
(sum, count) = Tally(otherValues); // deconstruct into existing variables?

In other words is the form (_, _, _) = e; a declaration statement, an assignment expression, or something in between?

This discussion intersects meaningfully with #254, declaration expressions.

Relationship with anonymous types

Since tuples would be compiler generated types just like anonymous types are today, it's useful to consider rationalizing the two with each other as much as possible. With tuples being structs and anonymous types being classes, they won't completely unify, but they could be very similar. Specifically, anonymous types could pick up these properties from tuples:

Once in the language, there are additional conveniences that you can imagine adding for tuples.

Tuple members in scope in method body

One (the only?) nice aspect of out parameters is that no returning is needed from the method body - they are just assigned to. For the case where a tuple type occurs as a return type of a method you could imagine a similar shortcut:

public (int sum, int count) Tally(IEnumerable<int> values) 
{
    sum = 0; count = 0;
    foreach (var value in values) { sum += value; count++; }
}

Just like parameters, the names of the tuple are in scope in the method body, and just like out parameters, the only requirement is that they be definitely assigned at the end of the method.

This is taking the parameter-result analogy one step further. However, it would special-case the tuples-for-multiple-returns scenario over other tuple scenarios, and it would also preclude seeing in one place what gets returned.

Splatting

If a method expects n arguments, we could allow a suitable n-tuple to be passed to it. Just like with params arrays, we would first check if there's a method that takes the tuple directly, and otherwise we would try again with the tuple's members as individual arguments:

public double Avg(int sum, int count) => count==0 ? 0 : sum/count;

Console.WriteLine($"Avg: {Avg(Tally(myValues))}");

Here, Tally returns a tuple of type (int sum, int count) that gets splatted to the two arguments to Avg.

Conversely, if a method expects a tuple we could allow it to be called with individual arguments, having the compiler automatically assemble them to a tuple, provided that no overload was applicable to the individual arguments.

I doubt that a method would commonly be declared directly to just take a tuple. But it may be a method on a generic type that gets instantiated with a tuple type:

var list = List<(string name, int age)>();
list.Add("John Doe", 66); // "unsplatting" to a tuple

There are probably a lot of details to figure out with the splatting and unsplatting rules.

gafter commented 9 years ago

The runtime type (int, int) would be a synonym for (a different way of writing) a type something like System.StructTuple<int, int>.

dsaf commented 9 years ago

@gafter That's fair, although I have seen experienced people thinking that e.g. var is confusing (always / sometimes). I hope that C# team will continue using their best judgement and not listen to any of us too much :). A C# 7 CTP 1 would be a great way of checking whether names are that confusing in practice :).

PS: is there a fresh Language Design Review on the horizon? Topics like this one (at 100+ posts) are becoming hard to read and summarize (I realize that I am not helping here :).

whoisj commented 9 years ago

Doesn't Unification across assemblies section cover this?

I don't know, does it?

The runtime type (int, int) would be a synonym for (a different way of writing) a type something like System.StructTuple<int, int> .

OK, happy now - thanks :smile:

glopesdev commented 9 years ago

For me the "dream" (you always have to start with the dream) would be:

var Tally(IEnumerable<int> values)
{
    var sum = 0;
    var count = 0;
    foreach (var value in values) { s += value; c++; }
    return sum, count;
}

var sum, count = Tally(values);

var query = from x, y in left.Zip(right)
            select x * y;

Now I realize there are a lot of implications and "why nots" in the way of this dream, but I'll try my best to clear up those that seem solvable/reasonable so far:

1) Why var?

In my opinion, this would be basically saying to the compiler "apply to my method the same type inference rules as for lambdas". In lambdas you only need to know the input types, and the output type can often be inferred directly from the lambda body.

In cases where you cannot do the inference, you would specify the output types directly like so:

int, int Tally(IEnumerable<int> values);

2) Why no parenthesis?

I have to admit that my first honest reason is "because it feels right".

The second reason is that I share the feeling above that parenthesis are already overloaded for too many things (priority, method calls)... it feels like reading C# will soon approximate trying to understand LISP with the difference that LISP would have more consistent semantics...

3) What about type declarations?

It seems one of the main issues with this syntax is it seems to imply changes to type declarations. Specifically, it would allow for:

var x, y = 5, "hello";
int x, string y = 5, "hello";

I don't see this as too much of a problem, really. First, this change is fully backwards compatible, as this is a superset of allowed type declarations where in previous versions the types in the declaration list were forced to be identical. Second, is there any reason other than historical that such mixed declaration lists are not allowed?

It looks like possible ambiguous cases would be something like:

var x, y = MyFunction(); // am I assigning "y" or both "x,y"

Which seems like it could be solved by sensible resolution rules, plus a way to disambiguate manually (for example):

var x = 0, y = MyFunction() // I am definitely assigning "y"

This last example is not great but also raises another important question: how to handle direct assignment of the function result (i.e. var o = MyFunction())? I would say in this case there are two options:

A) Default to positional semantics (i.e. generate properties Item1, Item2, etc, like tuples). These can be assigned meaningful names in subsequent deconstruction statements.

B) Specify "default" names in the return statement (different ways):

var MyFunction()
{
    // assume returned identifiers provide the default names (similar to anonymous types)
    return result, flag;

    // explicit naming (like anonymous types), will have to be consistent across all return statements
    return result = 0, flag = true;
}

I would prefer B) in this case.

I know there are probably many more implications, but I tend to prefer to settle on the "dream" very clearly from the beginning. In my experience you never see the elegant solutions to difficult problems until you've persevered enough in trying to solve them, and that perseverence only comes after you see the dream clearly.

Anyway, my 2c and sorry for the long post.

alrz commented 9 years ago

would it be possible to deconstruct the tuple into other places like this?

(HttpContext.Current.Items["key"], this.SomeOtherProperty) = tuple;
Thaina commented 9 years ago

I want to propose using {} instead of () for Tuple

dsaf commented 9 years ago

@Thaina What is the justification/motivation? Every other relevant language ever - F#, Scala, Nemerle, Rust, Swift, Python - uses round brackets for tuples. This is if we forget about potential parsing confusion.

GeirGrusom commented 9 years ago

Using () also makes sense because this syntax is already used by another part of the language for a nearly identical concept: method arguments.

glopesdev commented 9 years ago

I would prefer that the option is there to drop the parenthesis altogether; it makes for much more readable code IMHO (e.g. see Python and my post above). Maybe they will be necessary to disambiguation in some cases and to build nested tuples, but would be great to have the shorthand version to make it more readable and it doesn't look like it introduces much parser ambiguity.

dsaf commented 9 years ago

@glopesdev it seems that you are talking about decomposition rather than tuples though?

glopesdev commented 9 years ago

@dsaf yes, mostly dropping the parenthesis is useful for tuple deconstruction, but it could also be applied to type declaration:

int, int p = 4, 2; // explicit tuple type variable declaration
var p = 4, 2; // type inference version

public int, int Func(int x) // function returning tuple (explicit type)
{
    return x, x * x;
}

public var Func(int x) // function returning tuple (type inference)
{
    return x, x * x;
}
Thaina commented 9 years ago

Because in C# we already has anonymous object which is new { } Using { } is more like structural data than argument. ( ) is used for argument function and type cast so it not feel right being tuple

Thaina commented 9 years ago

Dropping parentheses is also better than ( ). But I prefer { } to ensure block

But var is too vague. We can't even var on field yet

glopesdev commented 9 years ago

In fact, I'm starting to think tuples should not have names at all, but rather their semantics should always be given entirely by deconstruction. You shouldn't need to worry about matching the names of arguments, only their position, the same as argument lists.

This would solve the dilemmas mentioned above about renaming and matching tuple members, and would also solve the unification problem since all tuples would be unified under the same value type. These problems simply disappear by universally using the deconstruction syntax.

In fact, if tuples are being introduced might as well unify them with method argument lists (the "splatten" scenario, mentioned above). This should allow for plenty of optimizations, since often these tuples will already be stack-allocated anyway.

As long as you can deconstruct flexibly, you can even simplify the use of more complex types like IEnumerable<int, int>.

Here's some more complex examples:

var array = new[] { (4, 3), (1, 5), (8, 9) }; // array of int, int tuples
foreach (var x, y in array)
{
   // Do stuff
}

// Lambda example (here we need disambiguation with parenthesis I guess?)
array.Select((x, y) => x * y);

// splatten example
void MyFunc(int value, string text);

var callList = 5, "hello";
MyFunc(*callList); //we would need some operator for "splatten", borrowing Python's for now
GeirGrusom commented 9 years ago

As I mentioned earlier, method arguments are tuples. They are optionally named, so why shouldn't C# use the same syntax for tuples as method arguments when they are the same thing anyway? Why invent something new when there is already a precedence for this in the language?

EamonNerbonne commented 9 years ago

@GeirGrusom because the usage is different, and that means it has differing consequences on code readability and especially maintainability. Also, just because method arguments work that way doesn't mean it's a good idea; and now's the time to learn from past experience.

Consider that most methods argument tuples have just one place in code where the tuple is "unpacked" and that there is no way to abstract away the tuple as a whole. This means that confusion surrounding which argument means what is limited. You can look up the name of that third argument trivially by going to the method definition, or by writing tooling (i.e. intellisense) to do that for you. Nevertheless, standard advice is to limit the number of arguments to avoid confusion.

General tuples are much worse in this regard. They're first class objects and that's a downside here because it means it's not at all clear where that tuple came from - there may be several layers of indirection between tuple construction and deconstruction, including runtime indirection that make finding the tuple member "names" virtually impossible. Sure, you can avoid layers of indirection, but imposing that restriction undermines pretty much the most basic programming tenet - encapsulation. In essence: you don't want to impose roadblocks to extracting methods or interfaces.

As such, it's very unwise to use positional tuples anywhere that indirection might occur, which is so many places that it's probably not a concept worth having at all. Confusing a tuple (row, col) with (x,y) is easily done and leads to nasty bugs. Is it (sum, count) or (count, sum)? (min, max) or (max, min)?

Named tuple members are much more human-friendly, and provide useful compile-time checks as to whether you've sanely unpacked that tuple. If you're going to develop a codebase with more than one person over a period of more than a few weeks, such static checks are a real lifesaver.

Tuples with positional members (i.e. like method arguments) are an anti-feature C# should definitely avoid. Named tuple members are the way to go.

Thaina commented 9 years ago

To simplified ( ) is argument new { } is anonymous object What is closer to tuples?

By syntax; var i = { } // This feel like object var i = ( ) // Feel more like lambda expression

Tuple are not just a bunch of arguments It IS a GROUP of arguments Just arguments is naked individual. Tuple is block struct, a container of arguments So I prefer to have Tuple alike to anonymous object differ by new { } and plain { } In other word. I think anonymous object is reference type tuple while the new tuple is valuetype so it should just use same syntax

glopesdev commented 9 years ago

@EamonNerbonne I get your argument re. named tuples. It makes sense. However, I also agree with @GeirGrusom that we already have method argument tuples. The examples you mention of swapping x, y with row, col pretty much already exist in method calls, and I believe the problems of maintainability and readability are exactly the same...

I understand we may want to learn from the past, but if we now introduce a "better" method for output tuple resolution, why shouldn't this method be enforced for input tuples as well? Now it becomes an unfortunate dichotomy...

If we would change it, this would be a massive break with the past, as positional arguments have been the standard since the days of C.

On the other hand, if we change it just for output tuples then the asymmetry itself will be damaging. Suddenly you start getting protection when outputting multiple arguments; but then you need to unpack these named arguments into positional arguments when passing the result to another function... how should this assigment be made? Named arguments? Should we let tuples be passed into functions only if their names match? What about name clashing and swapping between different libraries? I think the issues are enormous...

I think unfortunately we still don't have a satisfactory alternative to positional input arguments, and as long as that's the case, I would use the exact same semantics for output arguments. Anything else seems like it will raise problematic questions.

EamonNerbonne commented 9 years ago

@glopesdev at heart that's a tradeoff between superficial consistency, and improving the language. Personally, I don't think this kind of consistency is really worth anything. There's no concrete advantage to programmers in making positional arguments look like tuple definitions (C's function pointer syntax sort of point out the disadvantages of symmetry for symmetries sake). The only value in looking like C is to easily attract early adopters (and C# is way past that point). Also: symmetry cuts both ways. Right now, C# objects have members with names - why introduce such a major inconsistency for a feature with so little immediate value?

You state that the problems where x, y can by confused with row, col are identical in the method call situation, but the post you replied to points out that this is not the case. Method arguments tend to be easy to find because they're not first class objects. That means finding the meaning of the n-th argument is generally as easy as go-to-definition or intellisense (or some similar bit of tooling help). By contrast, this proposal aims give tuples that first class status. The usage of a tuple may well be removed from the creation of a tuple by any number of layers, virtual method calls, overrides etc. - it will not be possible to find the construction site of any given tuple variable automatically. Positional tuples are a serious maintenance problem; while positional method arguments aren't quite as problematic (and even those aren't ideal).

I think it bears emphasizing that one of C#'s strengths today is large, long-lived applications. Lots of features and tools cooperate to make C# friendly to the maintenance programmer. A code base that uses positional tuples is going to be noticably harder to maintain. Undermining a key selling feature of C# sounds like bad trade-off for some syntactic brevity, especially if there are perfectly reasonable alternatives (such as named-member tuples).

You suggest that positional tuples are more easily transparently unpacked into function arguments. However, this isn't as trivial as you make it out to be: since tuples are first-class, this means you won't always want to unpack them; there will be cases wherein a tuple is just one of the positional arguments. Doing this automatically is almost certainly going to expose many corner cases with things like optional arguments and params arrays. Furthermore, C# already has the concept of named parameters. If there's a possibility of matching based on position, why not match based on name? In fact, that's likely to be easier to understand because the possibilities for confusion are much smaller - even params arrays and optional arguments have names, so it's clearer which tuple member corresponds to which argument should you want automatic implicit unpacking (which I don't think is necessarily a good idea for a first version of tuples in the language).

To summarize:

The key point for me is that named members will result in more maintainable code.

glopesdev commented 9 years ago

@EamonNerbonne you make good points, and I admit it makes me wonder whether we should be tackling this feature at all...

Would it be possible to have a conservative first step where the consumer of a function is always forced to unpack the return values? This would be symmetrical (sorry, I love symmetries) to the way method lists work right now. In this way you would ensure that you can always figure out the meaning (and order) of tuple arguments as you remove the layers of indirection. But anyway, that would make this less useful for LINQ queries, so nevermind...

I agree with you about C# and maintainability, but getting an answer to all these issues would definitely be a big step forward in having statically typed languages that will flow as easy as dynamic ones.

iam3yal commented 9 years ago

After some more reading I really like named tuples but the syntax is a bit verbose, I really hope that there's some plans to have syntactic sugar for it.

Something like the following can be really nice.

public (sum, count) Tally(IEnumerable<int> values) 
{
    // ...

    return (s, c);
}

Or

public var (sum, count) Tally(IEnumerable<int> values) 
{
    // ...

    return (s, c);
}

Another attempt but I don't really like it, in fact I don't think it even make sense but oh well...

public (var sum, count) Tally(IEnumerable<int> values) 
{
    // ...

    return (s, c);
}
aluanhaddad commented 9 years ago

Named tuples are absolutely a must. It avoids many issues present in other implementations of tuples and is an excellent idea in general.

@eyalsk I really dislike the suggested type inference in your examples. This sort of approach can be reasonable in languages like C++ (C++ 14 added something similar) but that language has a completely different model of linking and compilation. In C#, the method body may not be available to the consuming code at all.

Anyway this would be a separate and significant feature proposal: namely return type inference for methods.

iam3yal commented 9 years ago

@aluanhaddad, you're right but I'm really concerned because resulting a tuple is going to be painful to look at from an aesthetics point of view of the code.

C# already has so many information loaded on the method's body that it seems like some methods will look like a very long train where you can't see the forest for the trees where you maintain a bunch of these methods or even individually, the method signature can be quite long.

I have a new idea using interfaces so I'll make a new proposal for this.

EDIT: Actually after some thought interfaces wouldn't make sense.

HaloFour commented 9 years ago

If anything I think it looking unattractive would be a good thing. Tuples are a nice little convenience but they are not a replacement for proper types and I for one plan on banning their use on my teams for anything with a public-facing contract.

I actually do think that there is a nice symmetry to named tuples in a method definition, though. Feels very much like a parameter list. And since people argue that the C/C++/C# parameter list is in of itself a tuple, then syntactically it makes a lot of sense.

public (int x, int y, int z) Something(int a, int b, int c) { ... }
whoisj commented 9 years ago

I for one plan on banning their use on my teams for anything with a public-facing contract.

since people argue that the C/C++/C# parameter list is in of itself a tuple, then syntactically it makes a lot of sense.

:+1:

alrz commented 9 years ago

This abatraction also makes it easier for #5058 to be applicable.

factormystic commented 9 years ago

I for one plan on banning their use on my teams for anything with a public-facing contract.

This is actually a really interesting notion to consider... is it too absurd to allow the new proposed tuple syntax for method return types only for private (and perhaps internal) methods? Obviously you could "just" return Tuple<T1,T2> or whatever same as today, but disallowing the new syntax on public methods is a fascinating concept. It'd be encoding that usage pattern right into the compiler.

I can't tell if this is stupid, clever, or stupid-clever.

iam3yal commented 9 years ago

@HaloFour yeah I agree, it's like I use var when it makes sense but I don't think I agree that it should look unattractive so people won't use it because that's not a valid reason, at least imo.

A valid reason would be to use it when it make sense, var is quite attractive and still most people are smart and educated not to abuse it, don't you think?

HaloFour commented 9 years ago

Ack; turned into more of a rant than I had intended. tl;dr IMHO + tuples + public = ew

Not all shorthand is created equal. I like var and I appreciate that it's very specific inference limitations ensure that the compiler isn't doing some crazy acrobatics that would make the code difficult to follow. It also doesn't pollute contracts in any form so I don't have to worry at all that some minor change in one place will cause a domino effect that will result in a breaking change which might go unnoticed until a dependent project is compiled.

I appreciate that var and tuples exist to solve a similar problem; not having to define lots of tiny one-off types to represent projections. But var projections are not intended to escape where they are defined and used. Tuples, as a part of a method signature, are explicitly designed to escape.

In my opinion those contracts are definitely on the wrong side of the terse/succinct ratio. Seeing a return type of ValueType<int, int, string, foo, bar> and then having to depend on someone having written some kind of documentation to explain what that means (we're all devs, we know how laughable that is) seems beyond bad idea to me. Even with names (which survive only in attribute metadata form) that doesn't make for a clear and intuitive public surface. I have enough trouble forcing decent naming conventions from developers who seem to think that keystrokes cost them money or that code should be a write-one-read-never proposition.

I absolutely love eliminating unnecessary keystrokes, but I also know very well that code will have to be read by a human significantly more frequently than it will be written and I try to optimize for that case.

And let me reiterate, this is my opinion. I don't care for tuples, at least not in the way these proposals plan on implementing them. The idea that sometime in the future there will be a ton of poorly documented libraries tossing around arbitrary tuples of values does not appeal to me.

iam3yal commented 9 years ago

@HaloFour, the way I see this is I will never expose tuples through my public APIs but internally they can make many things easier, so I guess we think alike. :)

alrz commented 9 years ago

@HaloFour

Seeing a return type of ValueType<int, int, string, foo, bar>

of course it's just wrong. maybe they should've used a record type to expose this much of data. as far as I know tuples are used mostly for "data flow" scenarios, e.g. monadic parsers. in functional languages, you can know the intended usage of functions just by looking at their signatures. you won't use tuples for just everything! this makes named items for tuples unnecessary.

Thaina commented 9 years ago

Please think about return Tuple of ref and ref of Tuple too

Generic can't have ref in generic parameter but if we allow syntactic tuple it then possible

HaloFour commented 9 years ago

@Thaina Not sure I understand what you mean. Method parameters are ref, not generic type arguments. How does ref fit into tuples?

Thaina commented 9 years ago

So you don't know about the "return ref" feature?

HaloFour commented 9 years ago

@Thaina That's what you're referring to? Considering that tuples would be normal structs if #118 was implemented I don't see why it wouldn't work with tuples. Although if "Tuple of ref" is intended to mean ValueTuple<ref int> that wouldn't be possible. Generic arguments can't be ref, it fundamentally changes what IL you can use against that type.

Thaina commented 9 years ago

Generic is not possible that's what I already said That's why if we like to let it possible for tuple. It need to be (ref int i,float f) Function() { } or {ref int i,float f} Function() { }

Or the explode way

(ref int,float) Function() { }
///
var i,f = Function();
HaloFour commented 9 years ago

@Thaina As proposed the tuple (int i, float f) would only be shorthand for ValueTuple<int, float>. I don't know how you could get ref properties on a tuple without some serious changes to this proposal.

Thaina commented 9 years ago

Because I don't completely agree with your proposed that's why. There is possibility of tuple we may want to use which is differ from just let it being ValueTuple<> generic. Such as; if it is ValueTuple<> generic we can't specify it field name to i and f. It must be int t1 float t2 because generic struct must be strongly named

Only appropriate is to create anonymous struct feature, not reuse shorthand like that

HaloFour commented 9 years ago

@Thaina That's not my proposal, that is the C# team's current proposal. They list their reason for not generating a new struct per tuple, primarily the lack of type equivalence. Even if tuples weren't ValueTuple<> and were instead their own structs I don't think ref would be possible because I'm fairly sure that you can't have a field of ref T or T& in verifiable code.

huan086 commented 9 years ago

I'm wondering if there is any need to introduce the concept of tuples into the language. The operations applicable to a tuple should be applicable to all other types (named classes, anonymous classes, structs) as well.

As a return type

public (int sum, int count) Tally(IEnumerable<int> values)
{
    // Reuse anonymous type syntax
    return new { Sum = values.Sum(), Count = values.Count() };
}
////
private class PrivateClass
{
    public int Sum { get; set; }
    public int Count { get; set; }
}
public (int sum, int count) Tally(IEnumerable<int> values)
{
    var dontExposePrivate = new PrivateClass { Sum = values.Sum(), Count = values.Count() };
    return dontExposePrivate;
}

Deconstruction

public class User
{
    public int Id { get; set; }
    public string Name { get; set; }
}
////
public User GetUser()
{
    return new User { Id = 2, Name = "abc" };
}
////
// Deconstruction on a class
(int id, string name) = GetUser();
Assert.AreEqual(id, 2);
Assert.AreEqual(name, "abc");
// Implicit conversion without loss of precision should be allowed. Works just like individual assignments.
(double sum, long count) = new { Sum = 0, Count = 0 };
// Type inference in LINQ
IEnumerable<(int id, string name)> users = from (id, name) in Enumerable.Range(0, 10).Select(index => GetUser()) // I always felt like doing List<var> for LINQ.
            select new { Id = id, Name  = name + "Clone" };

Splatting

public User CreateUser(int id, string name)
{
    return new User { Id = id, Name = name };
}
////
var splatMe = new { Id = 4, Name = "Splat" };
User user = CreateUser(params splatMe); // I'm hijacking the params keyword here for splatting
User splatAClass = CreateUser(params user);

On the programmer's side, there would thus no need to think about whether to construct a tuple or an anonymous type. On the compiler side, it would need to choose whether to use anonymous type, Tuple<> or ValueTuple<>, depending on how the type is used

I'm assuming that decomposition and splatting matches by name (though the case of the first letter gets changed)

HaloFour commented 9 years ago

@huan086

The problem is that the Tally methods in your example need to have a type embedded in their signature that can be recognized and consumed by the CLR and other languages. What exactly is (int sum, int count)? Given that the CLR doesn't have type equivalence the compiler would have to emit some form of type to represent the tuple but that would prevent the compiler from automatically treating the other types as the tuple type.

The issue with deconstruction is that with normal CLR types there is no inherent order to the properties or fields, so you'd be forced to always deconstruct by name. Given that the naming convention for public members differs from that of local variables or parameters that would also largely preclude deconstruction automatically by name.

McZosch commented 9 years ago
public (int sum, int count) Tally(IEnumerable<int> values) { ... }

to me is the same as public (and hence not so) anonymous types, which have been requested since the local anonymous type feature was rolled out. As many others, I'm not sold on their use, so I guess the feature is not worth the effort.

But there is definitely a need to better handle multiple output-parameters, and as such the feature could have its merits. The representation above would definitely be better readable than that hog:

public void Tally(IEnumerable<int> values, ref int sum, ref int count) { ... }

Disregarding, if one or the other declaration is used, to be better usable at the consumer side, calls to such functions should be allowed as follows:

var x = Tally(...);
Console.Write(x.sum);
Console.Write(x.count);

The type of x is a type that completely vanishes at compilation, the code can be transformed into a simple variable-list. The compiled code should look like this:

int x_sum;
int x_count;
Tally(..., x_sum, x_count);
Console.Write(x_sum);
Console.Write(x_count);

No new inaccessible complex type gets introduced; other systems can still use the hog-signature with its primitives.

Aside from this particular feature, I tend to think, new language features should make life easier. So, as I think about this from time to time, why can't we finally make C# (and VB) method result declarations typeless by default, inferring their type from the return statement?

public Tally(IEnumerable<int> values) { return new {sum = values.Sum(), count = values.Count()} }

If there is no return type, it is void. This code would be easier to refactor.

iam3yal commented 9 years ago

@McZosch, check out the answer @gafter gave me here and you will probably understand why they can't/won't infer types from the return statement itself and why types are actually important to some scenarios.

Now, I have bad solutions for it like having rules around it but then the code becomes inconsistent and so I don't have any good solutions to this problem.

So you might need to think about a really compelling argument for them to consider it.

DemiMarie commented 8 years ago

This sounds great! Will this be a struct or a class? I hope the former.

aluanhaddad commented 8 years ago

@drbo this proposal states that tuples will be value types as you desire.

wxs1 commented 8 years ago

Possible concern:

I've heard some people say that if we don't need to define specific return values/classes in some cases value can be lost, and specific typing so you could know where data came from.

Possible improvement options:

One possible way around that is to autogenerate types based on the class and method name.

Another possible option is just making the return values like normal types defined inline as an option:

public TallyResult (int sum, int count) Tally(IEnumerable values) { }

So this way a type is created called TallyResult (that we optionally provided). It might translate to: class TallyResult { int sum; int count; }; or struct if struct keyword used. An example use could be:

TallyResult results=Tally(values);

Benefits:

Concerns:

iam3yal commented 8 years ago

@wxs1 auto-generating types for this proposal is a really, really bad idea in terms of management.

I don't think people wants to see the compiler spit out code automatically and see it in their projects, not to mention, that it means people can use these types and then when you change the return signature it will spit out a new struct or overwrite existing file which can break the code.

I don't think people wants to have a complex workflow because this creates yet another obstacle and for large projects it adds yet more files for the IDE to parse and deal with which can make large projects even more challenging and I doubt people want unnecessary challenges.

Another thing, the compilation takes place and then it generates files, this can be a problem because what if you have a type with some name that matches the name of the generated type?

If the compiler auto-generate code it's better that it will be contained in the assembly behind the scene not as part of the project.

The compiler shouldn't be responsible for creating classes/structs easier but the IDE, it's out of context.

This proposal is about expressing tuples in the languages and thus provide a better way to deal with multiple return values.

wxs1 commented 8 years ago

@eyalsk so the second part I mentioned could still work by making a named tuple.

public TallyResult (int sum, int count) Tally(IEnumerable values) { }

Translates TallyResult to an equivalent of Tuple<int,int>.

An example use could be:

TallyResult results=Tally(values);

As mentioned some benefits/detriments with the result. Not sure if that would turn out to be a compiler issue as well though.

I like it because it can avoid the use of var, since var can sometimes mask inappropriate changes in type. But not sure I like the idea of defining a named tuple in line, but it might make things cleaner.

iam3yal commented 8 years ago

@wxs1 yeah but var is not part of the proposal, I think it was initially on the table or just an idea but they dropped it.

If you refer to the to the actual places where var is used above, this is optional, as far as I understand.

bbarry commented 8 years ago

So tuples seem to be simply anonymous record types....

edit: reminder to self: finish coffee before starting to type my thoughts out.