dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.93k stars 4.02k forks source link

Implement optimizations for constructing collections from collection literals #68785

Open cston opened 1 year ago

cston commented 1 year ago

Implement various optimizations for constructing collections from collection literals, including:

Relates to test plan https://github.com/dotnet/roslyn/issues/66418

CyrusNajmabadi commented 1 year ago

An additional optimization is that we should use https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.immutablecollectionsmarshal.asimmutablearray?view=net-8.0 (when present) for immutable arrays.

CyrusNajmabadi commented 1 year ago

@cston The intent here is for all of this to be present in C# 12, correct? This is a core part of the collection-expr story, so i want to make sure it's making the v1 release :)

CyrusNajmabadi commented 1 year ago

Also, on top of Enumerable.TryGetNonEnumeratedCount() we just need to directly instantiate with teh right capacity when all the args are either elements, or spreads of types with .Length/.Count since we can directly compute (with certainty) the final capacity.

Enumerable.TryGetNonEnumeratedCount() is for when we can't determine that statically, but still want to try at runtime if we're handed instances that can efficiently give us the count.

Whoops. Reading is fundamental.

CyrusNajmabadi commented 1 year ago

For Emit [] as Array.Empty<T>(), from offline talk, we want this to apply for both arrays and IEnumerables. so int[] = [] should use Array.Empty as well.

cston commented 1 year ago

Added the items from the comments above to the list in the issue description.

MichalPetryka commented 1 year ago

AddRange for List should also be specialcased to this Span extension method.

sakno commented 1 year ago

69355 offers ad-hoc solution only for arrays instead of generic approach. For instance, compiler might check whether the target type TCollection has static property TCollection Empty {get;} or static factory method TCollection<T> Empty<T>()

CyrusNajmabadi commented 1 year ago

or static factory method

@sakno as I mentioned in the lang chat, our pattern allows the type itself to decide what to return. So types can choose to return a .Empty Singleton if they so choose.

TonyValenti commented 1 year ago

I was thinking about this a bit and thought his was likely discussed elsewhere but I wanted to mention it:

  1. [] should be documented to not necessarily return a new instance.

  2. [] should ideally map to a .Empty static property or method so that it works with all types and is not special cased for arrays or enumerables.

CyrusNajmabadi commented 1 year ago

@TonyValenti

We do doc '1'. The overall design also gives the compiler broad leeway to make assumptions and optimize, including in ways that can be observable. There is a general presumption though of good behavior (in line with the same assumptions and optimizations we do in last patterns). For example, presuming that is you enumerate an empty collection that that will not mutate state anywhere and that you will get no elements returned from the enumeration.

'2' naturally falls out as we defer to the type to determine what value to return. Types with an Empty in the BCL are doing this already.

Types where it is safe to use an empty Singleton will all work this way. Types where it would not be safe (like List<T>) will not :-)

jnm2 commented 1 year ago

(Examples are decompiling code generated by SDK 8.0.0-preview.7.23375.6, with varying return types and a method body of return [.. enumerable];.)

Could enumerable.ToArray() be called here instead of an Add loop and an intermediate List<T>? This would take advantage of LINQ method optimizations that inspect the enumerable in case the size is known at runtime, etc.

// Emitted by SDK 8.0.0-preview.7.23375.6 and decompiled
private object?[] Example(IEnumerable<object?> enumerable)
{
    List<object> objectList = new List<object>();
    foreach (object obj in enumerable)
        objectList.Add(obj);
    return objectList.ToArray();
}

Could enumerable.ToImmutableArray() be called here instead of an Add loop and an intermediate List<T>?

// Emitted by SDK 8.0.0-preview.7.23375.6 and decompiled
private ImmutableArray<object?> Example(IEnumerable<object?> enumerable)
{
    List<object> objectList = new List<object>();
    foreach (object obj in enumerable)
        objectList.Add(obj);
    return ImmutableArray.Create<object>(new ReadOnlySpan<object>(objectList.ToArray()));
}
jnm2 commented 12 months ago

Could there be value in calling HashSet<T>.UnionWith (which it has instead of AddRange) when another hash set with a matching comparer is being spread into it?

RikkiGibson commented 11 months ago

Emit [] as default(Span<T>) when the target type is Span<T>.

This one may be worth doing for 17.8. I will begin working on it and we'll see if it lands in 17.8 or 17.9.

CyrusNajmabadi commented 11 months ago

Thank you @RikkiGibson !

alrz commented 11 months ago

Would it make sense to synthesize a single-element type for IEnumerable<T>, IReadOnlyList<T>, etc on [e]? I think this could be even up to a threshold, not only limited to a single-element.

CyrusNajmabadi commented 11 months ago

Yes it would.

Sergio0694 commented 11 months ago

Leaving this weird case here that I've found, in case it useful. Not sure if this is by design 🤔

using System;

static int[] Fast(ReadOnlySpan<int> array)
{
    return [0, 1, ..array, 3, 4];
}

static int[] Slow(ReadOnlySpan<int> array)
{
    return [0, 1, 2, ..array, 3, 4];
}

(sharplab link)

Fast correctly allocates an array and writes the items directly. Slow uses a temporary list instead (with no pre-sizing either).

Is there some hidden threshold here when you fall off this performance cliff? Should there be (as in, is this expected)? And if so, how can users know when they're going past it, to ensure they don't accidentally introduce performance regressions?

Sergio0694 commented 11 months ago

Additionally, for cases like those, or also this one:

static int[] Test(ReadOnlySpan<int> array)
{
    return [0, .. array, 1, .. array];
}

Ie. when the source is either a span, or something that implements ICollection<T>, couldn't Roslyn just call CopyTo on the source to copy the items into the target range, rather than manually using an enumerator instead, which is much slower?

CyrusNajmabadi commented 11 months ago

Yes. We have approved being allowed to call copyto

CyrusNajmabadi commented 11 months ago

Is there some hidden threshold here when you fall off this performance cliff?

There currently is. But, @cston we really need to rethink it. The threshold is far too small. We need to up it to a much better sweet spot.

slang25 commented 10 months ago

Hey y'all, is there a definitive list of what optimizations have landed for C# 12 and what is on to todo list, or is this issue that list? I can't tell if all of the optimizations made it in time for the GA.

CyrusNajmabadi commented 10 months ago

@slang25 this issue is that list. The items that have made it have check marks on them

karakasa commented 10 months ago

Just on another note, somehow I feel like some of the mentioned optimizations can be compared to automatic LINQ rewriting, especially those featuring "different code for different types" or "optimized path for types known at compile time".

I'm a bit curious about:

CyrusNajmabadi commented 10 months ago

Linq is explicitly defined as just a syntactic transformation. We would need to change the specification to allow the compiler to behave differently.

Collection expressions are not done in the same fashion. They're explicitlynot defined to l as syntactic transformations. Instead, the semantics are defined and it is explicitly called out that a compliant implementation is free to emit whatever it wants as long as those semantics are preserved.

If you would like linq to be changed, is recommend a new discussion on that topic. Thanks!

sakno commented 10 months ago

Currently (.NET 8), the following code

public static char[] Concat(ReadOnlySpan<char> span1, ReadOnlySpan<char> span2)
        => [..span1, ..span2];

produces

public static char[] Concat(ReadOnlySpan<char> span1, ReadOnlySpan<char> span2)
    {
        ReadOnlySpan<char> readOnlySpan = span1;
        ReadOnlySpan<char> readOnlySpan2 = span2;
        int num = 0;
        char[] array = new char[readOnlySpan.Length + readOnlySpan2.Length];
        ReadOnlySpan<char>.Enumerator enumerator = readOnlySpan.GetEnumerator();
        while (enumerator.MoveNext())
        {
            char c = (array[num] = enumerator.Current);
            num++;
        }
        enumerator = readOnlySpan2.GetEnumerator();
        while (enumerator.MoveNext())
        {
            char c = (array[num] = enumerator.Current);
            num++;
        }
        return array;
    }

The quality of codegen is poor. It cannot be considered as optimal. Why simple concatenation of twos spans cannot be done using Span<T>.CopyTo instead of MoveNext? In some cases, analyzer in VSCode suggests to replace some statements with collection literals however this replacement is incorrect from performance point of view. I guess that most of developers don't know how the collection literal translated under the hood. As a result, suggestion from IDE leads to performance loss that is hard to investigate. How is this codegen considered as production-ready?

CyrusNajmabadi commented 10 months ago

Why simple concatenation of twos spans cannot be done using Span.CopyTo instead of MoveNext?

Concatenation of spans (or anything for that matter) can be done with .CopyTo. And this was explicitly approved by the LDM. @RikkiGibson for visibility on this (unless you're already working on this).

however this replacement is incorrect from performance point of view.

Compiler strategy is an internal implementation detail.

How is this codegen considered as production-ready?

There are a near infinite number of things that can be optimized in the real world. Any particular optimization that is lacking is a non-starter for some customer. As such, if production-ready doesn't mean "sufficient for all customers", it means sufficient for the vast majority.

jnyrup commented 9 months ago

When using the combination of CollectionsMarshal.SetCount(list, count) for a constant count and CollectionsMarshal.AsSpan(list) we still leave the JIT with a Span<T> of unknown size, preventing it from eliding bounds checks when writing to the Span<T>.

On the other hand, when writing to a Span<T> created with MemoryMarshal.CreateSpan(ref T, length) created with the same constant value as used for CollectionsMarshal.SetCount(list, count) the bounds checks are now elided from the generated assembly.

As the List<T> is not visible from other threads at this point, I believe the pitfalls discussed in https://github.com/dotnet/runtime/issues/56773 does not apply here.

This optimization only seems applicable when the number of elements is a compile-time constant.

SharpLab

CyrusNajmabadi commented 9 months ago

Thanks for the info @jnyrup. Tagging @RikkiGibson and @stephentoub

ViIvanov commented 8 months ago

emit foreach (var x in y ?? []) as a lifted null check that executes nothing if 'y' is null.

It would be nice also to support the following use case:

List<object> list = [1, "a", .. items ?? [], "z"];
//                                    ^---^

and skip handling (adding) items to the list in case items is null.

alrz commented 7 months ago

Is using Unsafe valid to avoid allocating the synthesized type for IEnumerble IReadOnlyList IReadOnlyCollection?

(sharplab.io)

Sergio0694 commented 7 months ago

No. Using Unsafe.As<T>(object) to alias reference types that would not pass a normal cast is undefined behavior.

jnm2 commented 2 months ago

emit foreach (var x in y ?? []) as a lifted null check that executes nothing if 'y' is null.

I created a proof of concept for this, which likely needs to be redone per the TODO comment: https://github.com/dotnet/roslyn/compare/main...jnm2:optimize_foreach_coalesce_empty

@333fred raised a concern that skipping GetEnumerator and MoveNext calls on the transient empty collection could be too dangerous, and he suggested checking against known types like List<T>. Things like ImmutableArray<T>?, Dictionary<TKey, TValue> and HashSet<T> come to mind as well for that list.

Also, he would want to see benchmarks before replacing ?? Array.Empty<T>() with the check and jump.

Is it worth starting a new thread to discuss this?

Zetanova commented 2 months ago

@jnm2 What is the defined behavior in .net 1.1 foreach and enumerators, if it can be statically proofen that collection.Count == 0 ?

ArrayList items = new ArrayList();
foreach(object item in items) //is the Enumerator required by definition?
{
}
RikkiGibson commented 2 months ago

https://github.com/dotnet/csharpstandard/blob/draft-v8/standard/statements.md#1395-the-foreach-statement

Zetanova commented 2 months ago

@RikkiGibson thx, for the link. With out extension of the definition and some kind of marker/flag seams in the current form impossible to optimize it. First idea would be to introduce something like IBoundedEnumerable and include it in the foreach definition this would then result in:

bool empty = ((C)(x)) is IBoundedEnumerable b && b.Count == 0;
if(!empty)
{
    E e = ((C)(x)).GetEnumerator();
    try
    {
        while (e.MoveNext())
        {
            V v = (V)(T)e.Current;
            «embedded_statement»
        }
    }
    finally
    {
        ... // Dispose e
    }
}
jnm2 commented 2 months ago

@Zetanova That would not be as low-overhead as surrounding with an if null check, or a hypothetical future foreach? keyword.

Zetanova commented 2 months ago

@jnm2 simply doing nothing is the best option. the upcoming ´union types´ would resolve the issue. But I don't know how much collection literals require optimization today. The underlining logical issue is that the compiler can't currently statically test on an Array for Empty | Single | Many

jnm2 commented 2 months ago

I would urge us to still skip creating and enumerating an empty collection in foreach, because we have discussed a strong desire for this pattern to not actually instantiate inner collections, because this is currently the sole way to achieve conditional inclusion in the middle of a collection expression:

SomeCustomCollection x = [a, .. condition ? [b] : []];

(This may not ship in 13, but hopefully soon. In the meantime, you can write ? new[] { b } : [].)

If we can skip creating a non-empty transient collection there as well as an empty one, I would absolutely expect the same here (and we've even touched on this in design discussions):

SomeCustomCollection x = [a, .. items ?? []];

Now we're already quite parallel to foreach (var _ in items ?? []).

Zetanova commented 2 months ago

@jnm2 Interesting is that the lowering of foreach (var _ in Span<object>.Empty) is over a while loop over the indexer. This is already not defined in the foreach spec example: sharplab

jnm2 commented 2 months ago

@Zetanova Same for arrays. Underneath the compiler expansions which showcase the use of enumerators, it says:

An implementation is permitted to implement a given foreach_statement differently; e.g., for performance reasons, as long as the behavior is consistent with the above expansion.

Zetanova commented 2 months ago

@jnm2 when the foreach expression has the form T ?? new T() where T has class, IEnumerable then the https://github.com/dotnet/roslyn/compare/main...jnm2:optimize_foreach_coalesce_empty can be applied sharplab

For ValueTypes the this optimization should not apply.

but this example would break: sharplab I can't imaging that it is every been used this way, but who knows ...

An other option is to go the static default why like this: sharplab maybe there is already a type like this in the code base. But even this has some low risk to break a very badly designed Enumerable.

Don't really see a clean why without an extension. I would prefer instead of foreach? (var _ in items) {} this form foreach (var _ in items?) {}, It could also be applied to [a, ... items?] but it would be not aligned to var _ = items?[..]

Zetanova commented 2 months ago

@jnm2 I found a partial optimization to get rid of the heap-alloc. Instead of lowering items ?? [] to (items ?? new List<object>()).GetEnumerator() for the foreach. A duck-typing pattern could be added. If the resulting enumerator type E contains a static E Empty { get; } then it should be used for the foreach block E enumerator = items?.GetEnumerator() ?? E.Empty

Demo: sharplap