CyrusNajmabadi commented 3 years ago

Collection expressions

[x] Proposed
[ ] Specification: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-12.0/collection-expressions.md
[x] Prototype: Complete
[x] Implementation: In progress. Shipping everything but 'natural types' and 'dictionary literals' in C# 12.

Many thanks to those who helped with this proposal. Esp. @jnm2!

Summary

Collection expressions introduce a new terse syntax, [e1, e2, e3, etc], to create common collection values. Inlining other collections into these values is possible using a spread operator .. like so: [e1, ..c2, e2, ..c2]. A [k1: v1, ..d1] form is also supported for creating dictionaries.

Several collection-like types can be created without requiring external BCL support. These types are:

Array types, such as int[].
Span<T> and ReadOnlySpan<T>.
Types that support collection initializers, such as List<T> and Dictionary<TKey, TValue>.

Further support is present for collection-like types not covered under the above, such as ImmutableArray<T>, through a new API pattern that can be adopted directly on the type itself or through extension methods.

Motivation

Collection-like values are hugely present in programming, algorithms, and especially in the C#/.NET ecosystem. Nearly all programs will utilize these values to store data and send or receive data from other components. Currently, almost all C# programs must use many different and unfortunately verbose approaches to create instances of such values. Some approaches also have performance drawbacks. Here are some common examples:
- Arrays, which require either new Type[] or new[] before the { ... } values.
- Spans, which may use stackalloc and other cumbersome constructs.
- Collection initializers, which require syntax like new List<T> (lacking inference of a possibly verbose T) prior to their values, and which can cause multiple reallocations of memory because they use N .Add invocations without supplying an initial capacity.
- Immutable collections, which require syntax like ImmutableArray.Create(...) to initialize the values, and which can cause intermediary allocations and data copying. More efficient construction forms (like ImmutableArray.CreateBuilder) are unweildy and still produce unavoidable garbage.
Looking at the surrounding ecosystem, we also find examples everywhere of list creation being more convenient and pleasant to use. TypeScript, Dart, Swift, Elm, Python, and more opt for a succinct syntax for this purpose, with widespread usage, and to great effect. Cursory investigations have revealed no substantive problems arising in those ecosystems with having these literals built in.
C# has also added list patterns in C# 10. This pattern allows matching and deconstruction of list-like values using a clean and intuitive syntax. However, unlike almost all other pattern constructs, this matching/deconstruction syntax lacks the corresponding construction syntax.
Getting the best performance for constructing each collection type can be tricky. Simple solutions often waste both CPU and memory. Having a literal form allows for maximum flexibility from the compiler implementation to optimize the literal to produce at least as good a result as a user could provide, but with simple code. Very often the compiler will be able to do better, and the specification aims to allow the implementation large amounts of leeway in terms of implementation strategy to ensure this.

An inclusive solution is needed for C#. It should meet the vast majority of casse for customers in terms of the collection-like types and values they already have. It should also feel natural in the language and mirror the work done in pattern matching.

This leads to a natural conclusion that the syntax should be like [e1, e2, e3, e-etc] or [e1, ..c2, e2], which correspond to the pattern equivalents of [p1, p2, p3, p-etc] and [p1, ..p2, p3].

A form for dictionary-like collections is also supported where the elements of the literal are written as k: v like [k1: v1, ..d1]. A future pattern form that has a corresponding syntax (like x is [k1: var v1]) would be desirable.

Detailed design

The content of the proposal has moved to proposals/collection-expressions.md. Further updates to the proposal should be made there.

Design meetings

Working group meetings

YairHalberstadt commented 3 years ago

Overall looks good! Just a couple of points.

This facility thus prevents general use of such a marked method outside of known safe compiler scopes where the instance value being constructed cannot be observed until complete.

In the context of collection literals, the presence of these methods would allow types to trust that data passed into them cannot be mutated outside of them, and that they are being passed ownership of it. This would negate any need to copy data that would normally be assumed to be in an untrusted location.

This only works because at the moment it happens to be only a compiler can call init methods, if you don't use them yourself in one of your init properties.

However it doesn't seem like the sort of thing we'd want to rely on not changing in the future. For example we might allow calling init methods in the object initializer, at which point this would no longer be safe:

int[] ints = [1,2,3];
var immutArray = new{ Init(ints) };
int[0] = 5;

To resolve this, we could say that when evaluating a literal's spread_element expression, that there was an implicit target type equivalent to the target type of the literal itself. So, in the above, that would rewritten as:

It seems like this is not the most efficient solution - instead we would want to effectively inline the element, never materializing them into an array in the first place, and instead storing all the sub_elements on the stack.

Should we expand on collection initializers to look for the very common AddRange method? It could be used by the underlying constructed type to perform adding of spread elements potentially more efficiently. We might also want to look for things like .CopyTo as well. There may be drawbacks here as those methods might end up causing excess allocations/dispatches versus directly enumerating in the translated code.

We could only use those methods when the type of the spread_element exactly matches the parameter type of these methods, meaning we would not be causing virtual dispatch.

Can an unknown length literal create a collection type that needs a known length, like an array, span, or Init(array/span) collection? This would be harder to do efficiently, but it might be possible through clever use of pooled arrays and/or builders.

This is an extremely common use case - ToArray is the most commonly used Linq method. It would be unfortunate if the one syntax to rule them all required you to do: ((List<int>)[1, .. subspread, 2]).ToArray().

CyrusNajmabadi commented 3 years ago

@YairHalberstadt All good points. Thank you :)

alrz commented 3 years ago

Linking to https://github.com/bartdesmet/csharplang/.../proposals/params-builders.md on the builder pattern.

orthoxerox commented 3 years ago

What if collection literals with an unknown length have a different natural type from those with a known length? The latter should likely be a List<T>, but the latter is an IEnumerable<T> with an unspeakable implementation? That is, [..c1, ..c2] where either collection has an unknown length is semantically equivalent to Enumerable.Concat(c1, c2).

jnm2 commented 3 years ago

@orthoxerox Once you're able to target-type literals of unknown length to T[] and ImmutableArray<T>, which I very much hope is made possible and which is not solved by giving such literals a different natural type, then will it still be advantageous to give them a different natural type?

erikhermansson79 commented 3 years ago

A downside to using an array would be if a natural type is added for collection literals and that natural type is not T[]. There would be a potentially surprising difference when refactoring between var x = [1, 2, 3]; and IEnumerable x = [1, 2, 3];.

Can you explain this to me, please? Wouldn't whatever type you choose as a natural type implement IEnumerable<T>? What would the difference be?

jnm2 commented 3 years ago

@erikhermansson79 Besides is type checks and pattern matching behaving differently, as well as GetType, and overload resolution if you use dynamic, there's also this for example: casting to IList and reading IsFixedSize will be observably different. If you place this in a public IEnumerable<int> property and databind using a UI framework, the difference in behavior could amount to a user-facing regression. And so on.

orthoxerox commented 3 years ago

@orthoxerox Once you're able to target-type literals of unknown length to T[] and ImmutableArray<T>, which I very much hope is made possible and which is not solved by giving such literals a different natural type, then will it still be advantageous to give them a different natural type?

I agree, enumerating the collection literal just by changing the type of the variable it's assigned to sounds confusing.

MgSam commented 3 years ago

I think proposals benefit from having a section of various examples. Most people can't look at grammar rules and get a good feel for what the syntax will actually look like.

Separately, I will slightly object to this statement,

Looking at the surrounding ecosystem, we also find examples everywhere of list creation being more convenient and pleasant to use. TypeScript...

TS/JS just has a different syntax for initializing arrays. I don't know that its really much more convenient than new[] { ... }. It certainly doesn't have a generalized collection initialization syntax whatsoever. If you use a collection type other than Array in JS you are SOL.

CyrusNajmabadi commented 3 years ago

I think proposals benefit from having a section of various examples

It will look like: [a, b, .. c, d]

CyrusNajmabadi commented 3 years ago

Separately, I will slightly object to this statement,

The statement was that the presence of this literal form has not proven to itself be problematic for these languages. Not that the literal is sufficient for all usages in those languages.

In other words, these literals are not in "the bad parts". Nor are there contingents if users recommending people not use these.

jnm2 commented 3 years ago

Here's an immediate example I can think of:


var processArguments =
[
    "pack",
    "-o", outputPath,
    ..(configuration is not null ? (["-c", configuration]) : []),
    "/bl:" + Path.Join(artifactsDir, @"logs\pack.log"),
];

CyrusNajmabadi commented 3 years ago

I'm pretty firmly against approaches that either:

Involve lazy evaluation. For that, use linq.
Change the natural type of the expression based on the values within. I think that will just be too confusing and difficult to reason about.

bernd5 commented 3 years ago

Why not use curly braces like we have them for arrays already?

So you could write:

List<int> myInts = {1, 2, 3, 4};

CyrusNajmabadi commented 3 years ago

Why not use curly braces like we have them for arrays already?

This is referred to in the motivation section, but i'll give a little more information. Effectively {s have issues for us in being ambiguous in some situations if they mean lists or properties. So we've moved to [ with list patterns to have no ambiguity and to give a nice and clean syntax for list-like things. When we designed that we were aware that we likely wanted a 'literal correspondence' as well, which this proposal is.

I also cover the move from { to [ in the drawbacks section, albeit with the idea that this would allow us to have uniformity everywhere moving foward.

ALso note that { as an expression-form is potentially highly destabalizing for future work. For example, if we ever want a block expression then we couldnt' have that use { if { is also for lists. We sketched out and very much liked [...] for list patterns and these literals (both for looks, and for sidestepping a lot of issues), so we made the decision back in the pattern design to go this direction. We simply ordered it as "list pattern in c#10" and now hopefully "list literal in c#11".

MgSam commented 3 years ago

I think proposals benefit from having a section of various examples

It will look like: [a, b, .. c, d]

Not sure if you're being cheeky or what, but that is not a substitute for an example section. Some suggestions:

Before and afters of the various init syntaxes now vs how they'll look if using the proposed
Including the full context of where they might be used - declarations, method calls, etc. This is particularly relevant given C# already has the special array initialization syntax { } which only works for declarations
Since this is target-typing, presumably this doesn't work with var. Examples would help illustrate that
Does initializing collections of mixed types "just work" or does it require casting? For example, today I have to do new object[] { 1, "foo" } because the compiler cannot infer a best type.
Since spread operators currently don't work at all in existing collection initialization syntaxes, I think that needs to be segregated from other examples

CyrusNajmabadi commented 3 years ago

Since this is target-typing, presumably this doesn't work with var. Examples would help illustrate that

This is covered in the spec outline and the things to discuss. I don't want examples that imply things are not possible when no decision has been made on it.

CyrusNajmabadi commented 3 years ago

Does initializing collections of mixed types "just work"

This is covered in the spec as well. But I'll call out explicitly.

TonyValenti commented 3 years ago

Regarding immutability, here's an interesting thought:

Introduce "let" as a way of declaring an "immutable" variable.

let x=1;

Creates a readonly integer.

let x = [1,2,3];

Makes an immutable list

var x = [1,2,3];

Makes a regular list.

CyrusNajmabadi commented 3 years ago

Introduce "let" as a way of declaring an "immutable" variable.

The language has no concept of immutability (and i doubt it is likely to get one any time soon). I does have a concept of 'readonly' (and 'let' is already somethin we're considering there). So it likely wouldn't be a good fit as there would be inconsistency there.

Interesting idea though!

jnm2 commented 3 years ago

It would have to work for let x = new List<MyMutableClass> { new MyMutableClass() }; too.

BhaaLseN commented 3 years ago

init methods would be cool for other things as well. There have been many issues, discussions etc. in the past that asked for a way to move common initialization code from a ctor into a (specifically marked) method which then may still initialize readonly (and now: init) members which right now is limited to the ctor itself (for readonly) and initialization contexts (for init).

TahirAhmadov commented 3 years ago

Regarding the implicit type ("natural type"), I would say the array has 2 pros and a con. One pro is that it's the only "built-in" type - meaning, it's a type which is part of the language. Second - it's the most efficient collection (correct me if I'm mistaken) - all other collection rely on arrays behind the scenes. The con - arrays already have a simple enough new[] { ... } syntax, so that leads me to List<T>. In my work, the most annoying boilerplate is the new List<T> { ... } - replacing that with [ ... ] would be a great improvement.

PS. Another reason to go with List<T> - I and probably many others follow the rule of using explicit types for keyword types, like int[], and var for other types, including List<int>. This means if I want an array, I can always do int[] a= [1,2,3]; - which aligns with how I manage explicit/implicit types, and for List<T>, I currently do var a = new List<int> { 1, 2, 3, }; which is greatly simplified to var a = [1, 2, 3];

PPS. On second thought, perhaps we can just say no (for now) to implicit type and be done with it.

CyrusNajmabadi commented 3 years ago

Other cons are that it heap allocates and that it is fixed length.

TahirAhmadov commented 3 years ago

Isn't it possible to make List<T> work with 1 allocation? Doesn't its new List<T>(123) ctor allocate the internal array to the specified size from the get go? Also, what's stopping us from using a special new ctor or static factory method (which can possibly be made internal) - surely that's not too much to add?

jnm2 commented 3 years ago

The List<T> instance is one heap allocation, and the internal array is a second heap allocation. If the initial capacity is sufficient, there are no additional heap allocations or memory copies beyond that.

TahirAhmadov commented 3 years ago

Oh my, that was such a brain fart - of course the List<> itself needs an allocation. Which immediately made me think of another idea - can a new type be created for this? Something called ValueList<T>. It'll be a struct, implement IList<T>, and have implicit conversion to List<T>. Internally, it can even perform the necessary analysis - if the # of items is low, keep the items on the stack, too; if it grows beyond a certain limit, say, 1024 bytes, move it to the heap (or go straight to heap if initial capacity is >=1024 bytes).

PS. Internally, this type can either 1) use if statements to determine whether it's operating in stack or heap mode, or 2) have delegates which are assigned either the stack or heap "handlers".

CyrusNajmabadi commented 3 years ago

can a new type be created for this?

You are certainly welcome to create a new type. That's a core part of this proposal that the proposal would work with any type that followed certain shapes.

Now, if the BCL would add a type like this? My guess would be no. Such a type would likely be highly problematic. For example, if you passed this ValueList to someone else, and they captured it, then they would only see portions of your mutations. For exaple, if you added items, they would not see it (since their length would not update). HOwever, if you mutated items prior to that point, they would see it (sinced they shared the same array) unless you (or them) also caused a reallocation (where you both would have distinct arrays). Also, if one added an element, and then the other added, the other would overwrit the first. etc. etc.

It would be enormously confusing.

jnm2 commented 3 years ago

@TahirAhmadov Something very much in the spirit of what you just described is being considered: https://github.com/dotnet/runtime/pull/60519

TahirAhmadov commented 3 years ago

Now, if the BCL would add a type like this? My guess would be no. Such a type would likely be highly problematic. For example, if you passed this ValueList to someone else, and they captured it, then they would only see portions of your mutations.

Yes, I originally was thinking only in the context of local usage of the collection. The problems you raised are very real. The only way to solve it would be to somehow mark this type as not being passable by value:

[ByValueUsageProhibited]
public struct ValueList<T>
{
  ...
}
ValueList<T> Prop { get; set; } // error: cannot pass this type by value
Action<ValueList<T>> action; // error
void Foo(ValueList<T> list) { } // error

ValueList<T> Bar() { ... } // no problem; Bar is a non-async, non-enumerable method which returns and 
// can therefore no longer touch the collection
void Bar2(in ValueList<T> list) { } // no problem; ref and out also OK
Func<ValueList<T>> fund; // no problem; we know that generic type is used as the return type

Now I know what you are thinking - this is hairy and adds scope. However, think about the benefits we're getting. If ValueList<> is added to the BCL, then it's possible to have a fully mutable collection which accepts all literal forms and needs zero heap allocations, as the natural/implicit type for collection literals (but it's not async-friendly). On second thought, implicit convertibility to List<> would be a mistake - it can hide nasty bugs similar to what you described; a special method, List<T> ToList(), would be needed to move the collection to the heap if and when necessary. Also, I just realized it cannot implement IList<T> for similar reasons - boxing will introduce confusion around the state of it; it can only implement IReadOnlyList<T> and it's base interfaces. PS. I typed the above and then realized that this is very similar to a ref struct. Can this be a ref struct? Would it solve our state-sharing problems?

@TahirAhmadov Something very much in the spirit of what you just described is being considered: dotnet/runtime#60519

That issue is for an array; I was thinking a mutable list. Now it's an open question if the technique they're using to shoehorn an array onto the stack can be made to work for a mutable list. Frankly, I doubt it; it would probably need a deeper framework change.

PPS. Having thought about this, here's a rough mock up: (more complete version)

public ref struct ValueList<T>
{
  public void Add(T item)
  {
    this.EnsureCapacity(this._size + 1);
    this.Set(this._size, item);
    ++this._size;
  }
  // Insert and Remove are implemented similarly - using Get/Set as abstractions to hide stack/heap mode
  public T this[int index] { get { if(index >= this._size) throw ...; return this.Get(index); } set { ... } }

  int _size;
  T[]? _array;
  T _item0, _item1, .... , _item127; // we'll need to figure out how many items to allow in stack mode

  T Get(int index) // also a similar Set is needed; these are low level access methods
  {
    if(this._array != null) return this._array[index];
    else { /* get (or set) the appropriate field; either a long switch or some "hack" to get by memory offset */ }
  }
  void EnsureCapacity(int capacity)
  {
    if(capacity > 128 && this._array == null)
    {
      this._array = new T[256]; 
      this._array[0] = this._item0;
      // and so on - or some unsafe memcpy type operation
    }
    else if(this._array != null && capacity > this._array.Length) { /* standard array resize - new, copy, assign */  }
  }
}

Joe4evr commented 3 years ago

The only way to solve it would be to somehow mark this type as not being passable by value:

Which leads to #2372.

sab39 commented 3 years ago

It seems to me that the "natural" type of a collection literal is something quite specific, and that almost all the ingredients for it already exist in the language/runtime, but that it doesn't correspond to a specific existing nameable or concrete type.

So what if the natural type was anonymous (and rendered in the IDE similar to how other anonymous types are, so that hovering the var in var ints = [1, 2, 3]; would show something like [int] or [int...]), and the compiler did some magic to get the desired effects.

In terms of what the behavior would be, I'm looking to string and ReadOnlySpan<T> for guidance. In today's C#, the semantics of var str = "hello"; are very similar to what we'd want from a hypothetical var str = ['h', 'e', 'l', 'l', 'o'];, while ReadOnlySpan<T> gives the performance benefits of avoiding allocations. I don't believe raw collection literals should be mutable, for the same reasons that string literals aren't.

The specific behavior I'm imagining behind the scenes is:

Uses a ReadOnlySpan<T> when possible, and provides all the same methods and APIs
If the value needs to escape the current stack frame due to async or being captured in a lambda, the compiler transparently backs it with heap-allocated storage, eg equivalent to ReadOnlySpan<T> values = new T[n]
The compiler inserts implicit conversions to IEnumerable<T>, IReadOnlyCollection<T> and IReadOnlyList<T> when applicable (although conversion to ReadOnlySpan<T> is preferred).
The type is considered to be covariant in T just as the interfaces would be
When cast to a reference type or implicitly converted to an interface, the exact type is explicitly an implementation detail and at the compiler's discretion, other than implementing the specified interfaces. (ImmutableArray would be a reasonable choice for the compiler to make in practice but it could also choose a private type to explicitly avoid users depending on it, or even decide to be weird and use string if T happens to be char.
The compiler would be free to (or required to?) optimize by reusing the same boxed object if multiple reference type conversions happen to the same value.

In general for scenarios where the target type is unspecified, I would leave the details of implementation as unspecified as possible to allow for future improvements. This is especially important when the length of the collection may be known at compiletime, at runtime, or not at all until the elements are enumerated. The compiler should be free to use all the same tricks for ['h', 'e', 'l', 'l', 'o'] as it does for "hello".

Would that be feasible? Any scenarios it would cause problems for?

CyrusNajmabadi commented 3 years ago

I don't believe raw collection literals should be mutable

I don't see the justification for that. Why shouldn't they be mutable?

Imagine something like:

var result = [0];

// do work adding more elements to result

return result;

Mutable collections are the default in .net (And have been for ages). Given that the natural type is needed in local code (not like fields, or other state), it's unclear why mutable would not be the sensible default.

CyrusNajmabadi commented 3 years ago

Would that be feasible? Any scenarios it would cause problems for?

I think teh above proposal could be simplified as: The natural type is an array. It is stack-alloc'ed if possible, otherwise it is heap-alloced.

It seems to fit all your cases above with the same restrictions and same benefits. Is there a substantive difference between the above post, and the form in this post? Thanks!

jnm2 commented 3 years ago

When considering the natural type, it might be worth considering how it affects attribute arguments:

[SomeAttribute([1, 2, 3])]
class SomeAttribute : Attribute
{
    public SomeAttribute(object value) { }
}

A slightly more realistic variant might be an object[] parameter or property with nested lists such as [[1], [2, 3]].

I don't love these options:

always use array as natural type
special case for attributes
new syntax can't become uniform, replacing new[] { 1, 2, 3 } everywhere

One option could be to extend the metadata to natively represent calls to constructors and .Add methods, enabling new List<int> { 1, 2, 3 } to work as an attribute argument, and then allowing List<T> to be the natural type of [1, 2, 3] without special-casing attributes or leaving them out. This seems like a lot of work on its own but could be highly desirable for other reasons, and it's been suggested in unrelated discussions.

TahirAhmadov commented 3 years ago

When considering the natural type, it might be worth considering how it affects attribute arguments:

I actually think whenever the target type is object, it should be an array. This is different than var where there is no target type. Arrays implement all the interfaces that an object can be cast to when determining whether it's a collection of some sort, and it solves the attribute problem without any extra work, and they also need 1 allocation only - which is needed in all scenarios, because a hypothetical value type will need to be boxed anyway, and it's better than List<T> with its 2 allocations. The code which "receives" the object is very unlikely to expect to be able to modify the collection, and similarly the "sender" code is unlikely to expect a modified collection back when passing it in as an object.

sab39 commented 3 years ago

Mutable collections are the default in .net (And have been for ages). Given that the natural type is needed in local code (not like fields, or other state), it's unclear why mutable would not be the sensible default.

That's a true statement, but so is: "Immutable literals are the default in .Net / C#, and have been for ages".

When I look at your code example, I read it more like:

var result = "hello";
// do work adding more text to result
return result;

Seems like mutability has become a bit of a fraught topic at the moment in the language design so I hesitate to wade into it too deep (I'm already nerd-sniping myself hard here by jumping in in the first place) but generally when I assign a literal to a variable I don't expect it to change unless I assign to that variable again. And sure, mutable collections have been C#'s default, but hasn't there been a whole lot of inefficiency rooted in the need to copy any array passed to your API, because you can't know the caller won't mutate it later?

Also, making it immutable addresses the problem that range variables in foreach had back in the day when lambda capturing was introduced, and avoids any potential confusion about the behavior of

var a = [1, 2, 3];
var b = a;
a[1] = 0;

I think teh above proposal could be simplified as: The natural type is an array. It is stack-alloc'ed if possible, otherwise it is heap-alloced.

It seems to fit all your cases above with the same restrictions and same benefits. Is there a substantive difference between the above post, and the form in this post? Thanks!

Obviously you know way more than me about the details of language behavior - I'm just an interested observer - but the main differences I see, other than immutability, are explicitly making the natural type opaque and hiding implementation details to allow for optimizations, so that the compiler can give the same efficiency for a literal collection of a primitive type that a string provides when the type is char:

foreach (Action<IEnumerable<char>> op in operations) {
  var collection = ['a', 'b', 'c'];
  op(collection);

  var str = "abc";
  op(str);
}

sab39 commented 3 years ago

I did just realize that the ship may already have sailed on literals being immutable, because of tuples. It had actually never occurred to me that ValueTuple might be mutable - I'd always assumed that it was immutable, since that's the pattern set by every other value type in the System namespace and also by its reference-typed predecessor.

I do understand that there are good reasons for making it mutable, but it was a very surprising revelation!

HaloFour commented 3 years ago

There's an interesting point here. Of the existing literals only one is for a reference type (string) which is an immutable data type. All of the other literals are for value types (including tuples) where the problems with mutability are mitigated by the fact that they are copied by value.

TahirAhmadov commented 3 years ago

When I look at your code example, I read it more like:
var result = "hello";
// do work adding more text to result
return result;

In this case, adding more text would involve repetitive allocations of strings. Yes, I understand that "abc" + "def" is optimized, but for(...) { result += "bla"; } isn't - StringBuilder should be used for this - which is a mutable type. Same idea with arrays; doing something like result = result.Concat(new[] { 1 }).ToArray(); or result = [..result, 1];, would be repetitive new allocations - whereas a mutable collection like List<T> improves on this significantly.

And sure, mutable collections have been C#'s default, but hasn't there been a whole lot of inefficiency rooted in the need to copy any array passed to your API, because you can't know the caller won't mutate it later?

Yes that's a problem, but it's more of a shifting the work from one place to another, not getting rid of it.

void Foo(int[] a) { int[] b = a.ToArray(); ... } // save a local copy to make sure nobody else modifies it
void Bar(ImmutableArray<int> a) { ... } // great, no need to make a local copy
...
int[] x = new[] { 1, 2, 3 };
Foo(x); // no need to make a copy
Bar(ImmutableArray.Create(x)); // oops, here we have to create a special copy to pass into the method

Also, making it immutable addresses the problem that range variables in foreach had back in the day when lambda capturing was introduced, and avoids any potential confusion about the behavior of
var a = [1, 2, 3];
var b = a;
a[1] = 0;

There is no confusion here. Readonly variables, such as foreach ones, do not make the object they reference immutable. That is standard behavior across the entire .NET ecosystem.

sab39 commented 3 years ago

In this case, adding more text would involve repetitive allocations of strings. Yes, I understand that "abc" + "def" is optimized, but for(...) { result += "bla"; } isn't - StringBuilder should be used for this - which is a mutable type. Same idea with arrays; doing something like result = result.Concat(new[] { 1 }).ToArray(); or result = [..result, 1];, would be repetitive new allocations - whereas a mutable collection like List<T> improves on this significantly.

Sure, I wasn't proposing it as being good practice! Just that the semantics are unexpected, because literals aren't usually mutable in C# (with the exception of ValueTuple, which is already pretty surprising in my book, but has less issues as @HaloFour pointed out because of its copy-by-value nature). Besides, the idea of mutating a variable initialized as a literal doesn't only apply in append-loops:

var str = "normal";
if (somethingWeird) { str = "unusual"; }
return str;

Yes that's a problem, but it's more of a shifting the work from one place to another, not getting rid of it.

Not necessarily! The approach I suggested was deliberately designed to allow for the optimization laid out in issue #5295 which would potentially get to zero allocations when the elements are compile-time constant. And since the compiler can guarantee that the underlying array is never accessed directly, it's free to use other strategies to avoid the copy as well.

TahirAhmadov commented 3 years ago

Sure, I wasn't proposing it as being good practice! Just that the semantics are unexpected, because literals aren't usually mutable in C#

I think you look at the expression "collection literal" and associate it with "primitive" literals like strings. This proposal sounds a lot more like syntax improvement for initializing collections; maybe the expression "collection literal" is a misnomer - perhaps it should be named "universal collection initializer".

Not necessarily! The approach I suggested was deliberately designed to allow for the optimization laid out in issue #5295 which would potentially get to zero allocations when the elements are compile-time constant. And since the compiler can guarantee that the underlying array is never accessed directly, it's free to use other strategies to avoid the copy as well.

Yes but that #5295 issue is a whole different use case. First of all, I don't see why it's so important when static readonly has been around forever. Secondly, and more importantly, this proposal is aimed more at collection initialization, and not at "constant" creation. In other words, if you need to create "constant collections", you can already use static readonly (which adds negligible warm-up time); and if you need "variable collections" (like fields, locals, etc.), the [1, 2, 3] syntax is proposed.

sab39 commented 3 years ago

My impression is that this proposal is designed to do both - it's certainly intended as an efficient and concise collection initializer, but I don't think calling it "collection literals" is accidental.

If it's only intended to be used as a collection initializer, then the decision is easy: it should only be allowed in target-typed contexts, and using it with var would be an error.

Here's another use case where immutability would make a big difference:

public IEnumerable<string> SupportedVersions => ["1.0", "1.1"];

(I realize that this is actually target-typed and therefore technically outside what I proposed originally, but a lot of the same considerations apply for "target-typed to IEnumerable<T>")

Without immutability, this property would have to allocate a new array/collection every time it's called. With immutability, it could return the same instance every time.

This actually raises an interesting wild idea that's separate from the question of natural type: just how clever can this syntax let us be when target-typed to collection types that ARE immutable? The compiler is smart enough that if you use the same literal string in multiple places in your code, they'll all refer to the same instance of System.String, so that object.ReferenceEquals("hello", "hello") is true. On a scale of one to world peace and rainbow unicorns, just how fantastical is it to imagine a world where...

ImmutableList<int> a = [1];
ImmutableList<int> b = [1];
if (object.ReferenceEquals(a, b)) Console.WriteLine("Hallelujah!");

TahirAhmadov commented 3 years ago

My impression is that this proposal is designed to do both - it's certainly intended as an efficient and concise collection initializer, but I don't think calling it "collection literals" is accidental.

Absolutely, this proposal allows you to target type to an immutable collection if that's what you need - but the question is, what would a developer expect logically with var; and given that var is only used for local variables, it's very unlikely that in that scenario, immutability is needed or even possible (meaning, it's much more likely that the collection will be modified as part of the method's logic).

Here's another use case where immutability would make a big difference:
public IEnumerable<string> SupportedVersions => ["1.0", "1.1"];

Again, in these cases, create a static readonly ImmutableArray<string> _sv = ["1.0, "1.1"]; and return that.

In general, I very much agree that immutability is a very important consideration when designing the "interface" of a type; I just don't think this proposal is aimed at taking immutability in either direction. Even if we say that var means a mutable collection, it would be only because that's what the method logic is most likely to need to do; it would not change existing patterns by encouraging higher usage of mutable collections than before.

CyrusNajmabadi commented 3 years ago

That's a true statement, but so is: "Immutable literals are the default in .Net / C#, and have been for ages".

I don't agree with this. For example, tuples do not follow this. Neither do arrays.

CyrusNajmabadi commented 3 years ago

are explicitly making the natural type opaque and hiding implementation details to allow for optimizations

We discussed this heavily in the LDM meeting (the notes go into this), and the preliminary thoughts (which are of course subject to change) pushed back heavily on this. Effectively, there is a strong belief that we will not have the ability to change what actually happens here as people will absolutely take explicit or implicit dependencies on whatever we do, and any change to somethign so core would almost certainly be destabilizing. :)

CyrusNajmabadi commented 3 years ago

Of the existing literals only one is for a reference type (string)

Well... and every single array :)

CyrusNajmabadi commented 3 years ago

Without immutability, this property would have to allocate a new array/collection every time it's called. With immutability, it could return the same instance every time.

Even with immutability it would have to create something new every time (unless we explicit spec that immutable values also are only instantaited once... which i'm extremely wary about stating).

If we instantiate only once, that also means you now may dangle potentially enormous lists in memory. It feels extremely unsafe and a huge potential footgun. I think if you want to actually cache and return the same value, it is incumbent on you to do the rooting yourself (in a static-readonly for example).

sab39 commented 3 years ago

(Sorry for yet another bit of spam, but I hope it is actually productive to the discussion rather than derailing!)

Something that maybe should be added to the Unresolved Questions, especially if it's decided to not give these literals a natural type, would be whether foreach (var i in [1, 2, 3]) is legal. It seems like it should be, even though foreach doesn't directly imply a specific type. A similar consideration applies to from i in [1, 2, 3] select f(i) and how, if at all, a Select extension method (or whatever) would be looked up in that case.

CyrusNajmabadi commented 3 years ago

On a scale of one to world peace and rainbow unicorns, just how fantastical is it to imagine a world where...

I genuinely think this would be a bad thing. It would require the compiler to statically cache literals in some location, which is almost certainly going to be horrific for some use cases.

dotnet / csharplang

[Proposal]: Collection expressions (VS 17.7, .NET 8) #5354

Collection expressions

Summary

Motivation

Detailed design

Design meetings

Working group meetings