Open cston opened 1 year ago
An additional optimization is that we should use https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.immutablecollectionsmarshal.asimmutablearray?view=net-8.0 (when present) for immutable arrays.
@cston The intent here is for all of this to be present in C# 12, correct? This is a core part of the collection-expr story, so i want to make sure it's making the v1 release :)
Also, on top of Enumerable.TryGetNonEnumeratedCount()
we just need to directly instantiate with teh right capacity when all the args are either elements, or spreads of types with .Length/.Count since we can directly compute (with certainty) the final capacity.
Enumerable.TryGetNonEnumeratedCount() is for when we can't determine that statically, but still want to try at runtime if we're handed instances that can efficiently give us the count.
Whoops. Reading is fundamental.
For Emit [] as Array.Empty<T>()
, from offline talk, we want this to apply for both arrays and IEnumerables. so int[] = []
should use Array.Empty as well.
Added the items from the comments above to the list in the issue description.
AddRange for List should also be specialcased to this Span extension method.
TCollection
has static property TCollection Empty {get;}
or static factory method TCollection<T> Empty<T>()
or static factory method
@sakno as I mentioned in the lang chat, our pattern allows the type itself to decide what to return. So types can choose to return a .Empty Singleton if they so choose.
I was thinking about this a bit and thought his was likely discussed elsewhere but I wanted to mention it:
[] should be documented to not necessarily return a new instance.
[] should ideally map to a .Empty static property or method so that it works with all types and is not special cased for arrays or enumerables.
@TonyValenti
We do doc '1'. The overall design also gives the compiler broad leeway to make assumptions and optimize, including in ways that can be observable. There is a general presumption though of good behavior (in line with the same assumptions and optimizations we do in last patterns). For example, presuming that is you enumerate an empty collection that that will not mutate state anywhere and that you will get no elements returned from the enumeration.
'2' naturally falls out as we defer to the type to determine what value to return. Types with an Empty in the BCL are doing this already.
Types where it is safe to use an empty Singleton will all work this way. Types where it would not be safe (like List<T>
) will not :-)
(Examples are decompiling code generated by SDK 8.0.0-preview.7.23375.6, with varying return types and a method body of return [.. enumerable];
.)
Could enumerable.ToArray()
be called here instead of an Add loop and an intermediate List<T>
? This would take advantage of LINQ method optimizations that inspect the enumerable in case the size is known at runtime, etc.
// Emitted by SDK 8.0.0-preview.7.23375.6 and decompiled
private object?[] Example(IEnumerable<object?> enumerable)
{
List<object> objectList = new List<object>();
foreach (object obj in enumerable)
objectList.Add(obj);
return objectList.ToArray();
}
Could enumerable.ToImmutableArray()
be called here instead of an Add loop and an intermediate List<T>
?
// Emitted by SDK 8.0.0-preview.7.23375.6 and decompiled
private ImmutableArray<object?> Example(IEnumerable<object?> enumerable)
{
List<object> objectList = new List<object>();
foreach (object obj in enumerable)
objectList.Add(obj);
return ImmutableArray.Create<object>(new ReadOnlySpan<object>(objectList.ToArray()));
}
Could there be value in calling HashSet<T>.UnionWith
(which it has instead of AddRange
) when another hash set with a matching comparer is being spread into it?
Emit
[]
asdefault(Span<T>)
when the target type isSpan<T>
.
This one may be worth doing for 17.8. I will begin working on it and we'll see if it lands in 17.8 or 17.9.
Thank you @RikkiGibson !
Would it make sense to synthesize a single-element type for IEnumerable<T>
, IReadOnlyList<T>
, etc on [e]
? I think this could be even up to a threshold, not only limited to a single-element.
Yes it would.
Leaving this weird case here that I've found, in case it useful. Not sure if this is by design 🤔
using System;
static int[] Fast(ReadOnlySpan<int> array)
{
return [0, 1, ..array, 3, 4];
}
static int[] Slow(ReadOnlySpan<int> array)
{
return [0, 1, 2, ..array, 3, 4];
}
Fast
correctly allocates an array and writes the items directly.
Slow
uses a temporary list instead (with no pre-sizing either).
Is there some hidden threshold here when you fall off this performance cliff? Should there be (as in, is this expected)? And if so, how can users know when they're going past it, to ensure they don't accidentally introduce performance regressions?
Additionally, for cases like those, or also this one:
static int[] Test(ReadOnlySpan<int> array)
{
return [0, .. array, 1, .. array];
}
Ie. when the source is either a span, or something that implements ICollection<T>
, couldn't Roslyn just call CopyTo
on the source to copy the items into the target range, rather than manually using an enumerator instead, which is much slower?
Yes. We have approved being allowed to call copyto
Is there some hidden threshold here when you fall off this performance cliff?
There currently is. But, @cston we really need to rethink it. The threshold is far too small. We need to up it to a much better sweet spot.
Hey y'all, is there a definitive list of what optimizations have landed for C# 12 and what is on to todo list, or is this issue that list? I can't tell if all of the optimizations made it in time for the GA.
@slang25 this issue is that list. The items that have made it have check marks on them
Just on another note, somehow I feel like some of the mentioned optimizations can be compared to automatic LINQ rewriting, especially those featuring "different code for different types" or "optimized path for types known at compile time".
I'm a bit curious about:
Linq is explicitly defined as just a syntactic transformation. We would need to change the specification to allow the compiler to behave differently.
Collection expressions are not done in the same fashion. They're explicitlynot defined to l as syntactic transformations. Instead, the semantics are defined and it is explicitly called out that a compliant implementation is free to emit whatever it wants as long as those semantics are preserved.
If you would like linq to be changed, is recommend a new discussion on that topic. Thanks!
Currently (.NET 8), the following code
public static char[] Concat(ReadOnlySpan<char> span1, ReadOnlySpan<char> span2)
=> [..span1, ..span2];
produces
public static char[] Concat(ReadOnlySpan<char> span1, ReadOnlySpan<char> span2)
{
ReadOnlySpan<char> readOnlySpan = span1;
ReadOnlySpan<char> readOnlySpan2 = span2;
int num = 0;
char[] array = new char[readOnlySpan.Length + readOnlySpan2.Length];
ReadOnlySpan<char>.Enumerator enumerator = readOnlySpan.GetEnumerator();
while (enumerator.MoveNext())
{
char c = (array[num] = enumerator.Current);
num++;
}
enumerator = readOnlySpan2.GetEnumerator();
while (enumerator.MoveNext())
{
char c = (array[num] = enumerator.Current);
num++;
}
return array;
}
The quality of codegen is poor. It cannot be considered as optimal. Why simple concatenation of twos spans cannot be done using Span<T>.CopyTo
instead of MoveNext
? In some cases, analyzer in VSCode suggests to replace some statements with collection literals however this replacement is incorrect from performance point of view. I guess that most of developers don't know how the collection literal translated under the hood. As a result, suggestion from IDE leads to performance loss that is hard to investigate. How is this codegen considered as production-ready?
Why simple concatenation of twos spans cannot be done using Span
.CopyTo instead of MoveNext?
Concatenation of spans (or anything for that matter) can be done with .CopyTo. And this was explicitly approved by the LDM. @RikkiGibson for visibility on this (unless you're already working on this).
however this replacement is incorrect from performance point of view.
Compiler strategy is an internal implementation detail.
How is this codegen considered as production-ready?
There are a near infinite number of things that can be optimized in the real world. Any particular optimization that is lacking is a non-starter for some customer. As such, if production-ready doesn't mean "sufficient for all customers", it means sufficient for the vast majority.
When using the combination of CollectionsMarshal.SetCount(list, count)
for a constant count
and CollectionsMarshal.AsSpan(list)
we still leave the JIT with a Span<T>
of unknown size, preventing it from eliding bounds checks when writing to the Span<T>
.
On the other hand, when writing to a Span<T>
created with MemoryMarshal.CreateSpan(ref T, length)
created with the same constant value as used for CollectionsMarshal.SetCount(list, count)
the bounds checks are now elided from the generated assembly.
As the List<T>
is not visible from other threads at this point, I believe the pitfalls discussed in https://github.com/dotnet/runtime/issues/56773 does not apply here.
This optimization only seems applicable when the number of elements is a compile-time constant.
Thanks for the info @jnyrup. Tagging @RikkiGibson and @stephentoub
emit foreach (var x in y ?? []) as a lifted null check that executes nothing if 'y' is null.
It would be nice also to support the following use case:
List<object> list = [1, "a", .. items ?? [], "z"];
// ^---^
and skip handling (adding) items
to the list
in case items is null
.
Is using Unsafe
valid to avoid allocating the synthesized type for IEnumerble IReadOnlyList IReadOnlyCollection?
No. Using Unsafe.As<T>(object)
to alias reference types that would not pass a normal cast is undefined behavior.
emit
foreach (var x in y ?? [])
as a lifted null check that executes nothing if 'y' is null.
I created a proof of concept for this, which likely needs to be redone per the TODO comment: https://github.com/dotnet/roslyn/compare/main...jnm2:optimize_foreach_coalesce_empty
@333fred raised a concern that skipping GetEnumerator
and MoveNext
calls on the transient empty collection could be too dangerous, and he suggested checking against known types like List<T>
. Things like ImmutableArray<T>?
, Dictionary<TKey, TValue>
and HashSet<T>
come to mind as well for that list.
Also, he would want to see benchmarks before replacing ?? Array.Empty<T>()
with the check and jump.
Is it worth starting a new thread to discuss this?
@jnm2 What is the defined behavior in .net 1.1 foreach
and enumerators, if it can be statically proofen that collection.Count == 0
?
ArrayList items = new ArrayList();
foreach(object item in items) //is the Enumerator required by definition?
{
}
@RikkiGibson thx, for the link.
With out extension of the definition and some kind of marker/flag seams in the current form impossible to optimize it.
First idea would be to introduce something like IBoundedEnumerable
and include it in the foreach
definition
this would then result in:
bool empty = ((C)(x)) is IBoundedEnumerable b && b.Count == 0;
if(!empty)
{
E e = ((C)(x)).GetEnumerator();
try
{
while (e.MoveNext())
{
V v = (V)(T)e.Current;
«embedded_statement»
}
}
finally
{
... // Dispose e
}
}
@Zetanova That would not be as low-overhead as surrounding with an if
null check, or a hypothetical future foreach?
keyword.
@jnm2 simply doing nothing is the best option. the upcoming ´union types´ would resolve the issue.
But I don't know how much collection literals
require optimization today.
The underlining logical issue is that the compiler can't currently statically test on an Array
for Empty | Single | Many
I would urge us to still skip creating and enumerating an empty collection in foreach
, because we have discussed a strong desire for this pattern to not actually instantiate inner collections, because this is currently the sole way to achieve conditional inclusion in the middle of a collection expression:
SomeCustomCollection x = [a, .. condition ? [b] : []];
(This may not ship in 13, but hopefully soon. In the meantime, you can write ? new[] { b } : []
.)
If we can skip creating a non-empty transient collection there as well as an empty one, I would absolutely expect the same here (and we've even touched on this in design discussions):
SomeCustomCollection x = [a, .. items ?? []];
Now we're already quite parallel to foreach (var _ in items ?? [])
.
@jnm2 Interesting is that the lowering of foreach (var _ in Span<object>.Empty)
is over a while loop over the indexer.
This is already not defined in the foreach spec
example: sharplab
@Zetanova Same for arrays. Underneath the compiler expansions which showcase the use of enumerators, it says:
An implementation is permitted to implement a given foreach_statement differently; e.g., for performance reasons, as long as the behavior is consistent with the above expansion.
@jnm2 when the foreach expression has the form T ?? new T()
where T has class, IEnumerable
then the https://github.com/dotnet/roslyn/compare/main...jnm2:optimize_foreach_coalesce_empty can be applied sharplab
For ValueTypes the this optimization should not apply.
but this example would break: sharplab I can't imaging that it is every been used this way, but who knows ...
An other option is to go the static default why like this: sharplab maybe there is already a type like this in the code base. But even this has some low risk to break a very badly designed Enumerable.
Don't really see a clean why without an extension.
I would prefer instead of foreach? (var _ in items) {}
this form foreach (var _ in items?) {}
, It could also be applied to [a, ... items?]
but it would be not aligned to var _ = items?[..]
@jnm2 I found a partial optimization to get rid of the heap-alloc.
Instead of lowering items ?? []
to (items ?? new List<object>()).GetEnumerator()
for the foreach.
A duck-typing pattern could be added. If the resulting enumerator type E
contains a static E Empty { get; }
then it should be used for the foreach block E enumerator = items?.GetEnumerator() ?? E.Empty
Demo: sharplap
Implement various optimizations for constructing collections from collection literals, including:
[x] Use inline array for the collection builder method
ReadOnlySpan<T>
argument when the span cannot escape. #69227[x] Use inline arrays for collection expressions with span target types #69412
[x] Use existing optimization for
ReadOnlySpan<T>
to blit data into the assembly section when values are constants of primitive type. #69412[x] Emit
[]
asArray.Empty<T>()
when the target type isIEnumerable<T>
orT[]
. #69355[x] Emit
[]
asdefault(Span<T>)
when the target type isSpan<T>
. #70260[x] Avoid intermediate
List<T>
if all spread elements are countable and collection satisfies certain heuristics. #69875[ ] Use
EnsureCapacity(int)
for collection initializer types with no spreads; potentially also when length is known and collection satisfies certain heuristics.[x] Use
CollectionsMarshal.SetCount<T>(List<T>, int)
andCollectionsMarshal.AsSpan<T>(List<T>)
to create aList<T>
when length is known and collection satisfies certain heuristics; fallback toList<T>..ctor(int capacity)
ifCollectionsMarshal
is not available. #70197[x] Use
ImmutableCollectionsMarshal.AsImmutableArray<T>(T[])
to create anImmutableArray<T>
. #70222[ ] Use
.ctor(int capacity)
for some BCL types whenEnsureCapacity(int)
is not available, or not available downlevel.[ ] Emit
Enumerable.TryGetNonEnumeratedCount()
and avoid intermediateList<T>
at runtime. #71296[ ] Use
List<T>.AddRange(IEnumerable<T>)
when adding a spread to aList<T>
if the spread implementsICollection<T>
, or if the spread implementsIEnumerable<T>
and does not have astruct
enumerator; useCollectionExtensions.AddRange<T>(List<T>, ReadOnlySpan<T>)
if the spread is a span; otherwise useforeach
andList<T>.Add(T)
.[ ] Use
CollectionsMarshal.AsSpan<T>(List<T>)
to create aReadOnlySpan<T>
from the temporaryList<T>
when creating a[CollectionBuilder]
collection from a collection literal of unknown length.[ ] emit
foreach (var x in y ?? [])
as a lifted null check that executes nothing if 'y' is null.[ ] When producing an
IEnumerable<T>
use the same codegen pattern thatyield
does. Specifically, where the instance created is both anIEnumerable<T>
and its ownIEnumerator<T>
for the very first iteration of the enumerable on the same thread. This avoids an extra allocation for hte common case of producing anIEnumerable<T>
just to have it be immediately iterated.[ ]
Avoid heap allocation for spread in(moved to #69277)[.. x ? [y] : []]
.[ ]
Avoid heap allocation for collection in(moved to #69277)foreach (bool x in [true, false])
.Relates to test plan https://github.com/dotnet/roslyn/issues/66418