Open jcouv opened 6 years ago
I'll have a prototype shortly.
Nifty. Note: i just wrote the "'for' to 'foreach'" which looks for people using hte for (int i = 0; i < Length/Conut; i++)
pattern and offers to convert to a 'foreach' if you implement the foreach-pattern. If you do htis work, we should also look for the EnumerableAttribute and if so consider your type as foreach-able.
If using an attribute, is it possible to allow other patterns?
The async stream seems to use a pattern like:
public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
Task<bool> WaitForNextAsync();
T TryGetNext(out bool success);
}
How about sync version of this atomic TryGetNext
as the following:
public interface ISingleMethodEnumerator<out T> : IDisposable
{
T TryGetNext(out bool success);
}
@ufcpp That would be a separate proposal. We can discuss there. If I understand your idea, it could be triggered by the presence of ISingleMethodEnumerator
, so no need for the attribute in that case. That said, I'm not sure I see a big enough value to warrant that feature.
The purpose of the attribute is to tell the compiler to look for public Length
method and a public indexer, and no need for an interface. Maybe we can call the attribute Indexable
or IndexEnumerable
instead to emphasize that.
I'm not convinced this change is worth making. Consider the claimed benefits:
lightweight iteration
By my measurements, iterating using enumerator can be up to 3 times slower than using the indexer. That is a lot, but:
List<int>
) where the loop body is a single instruction. In most real code, the difference won't be as big.List<T>.Enumerator
and the JIT.EnumerableAttribute
to List<T>
would change its behavior: modifying the list while iterating it would no longer throw an exception. If that's not an acceptable change, then this proposal would have much less effect in practice, since List<T>
is used a lot. If it is an acceptable change, then it could be made in the existing code, likely decreasing the difference, which would also make this proposal less useful.for
is a fairly simple change that is relatively well know.lightweight implementation (no need to define enumerator type to make your type enumerable)
In cases, where performance is not critical, you don't need to define enumerator, the implementation can be just:
public IEnumerator<T> GetEnumerator()
{
for (int i = 0; i < Length; i++)
{
yield return this[i];
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
If performance is critical, defining a struct
enumerator is less than 100 lines.
In any case, I don't think people write such types often enough that they need a language feature to help with this.
I don't know why we need an attribute for this.
In any case, I don't think people write such types often enough that they need a language feature to help with this.
I'd agree with this only if we could somehow make this possible with arrays and ImmurableArray, ArrayBuilder, etc.
Arrays could be a special case but for other types we got to have a convention to make it possible.
Currently, foreach-ref works for duck typed ref Current { get; }
however, since it works only by that convention, there is no way to make that change without breaking things. I see two possible alternative:
GetRefEnumerator
(instead of GetEnumerator
) (and possibly a companion interface)@svick
I'm not convinced this change is worth making. [...] I don't think people write such types often enough [...].
That's fair.
For context, this suggestion came up a few times in LDM since the compiler has been adding special support for indexer-enumerable types. We recently added this special support to Span
and ReadOnlySpan
. We'll probably add it for Range
too. At some point, it's just easier to flag which types get the optimization ;-)
Adding EnumerableAttribute to List
would change its behavior
Yes. List<T>
is the poster boy for types that shouldn't be marked with the attribute.
This also answers @alrz's question: we need an attribute because in some cases enumerating via indexer is worse. The mere presence of Length
and an indexer isn't a reliable signal.
List has Count, though, not Length. :/
I don't think we need to generalize that
Current
and MoveNext
is seriously most generalized form of iteration. If there is object that should iterate by indexer then it's iterator job to optmize that by kept only index and compare with length for MoveNext
If returning iterator object is costly then I have seen a proposal on valuetype iterator #982 If returning object from Current is costly then I have a proposal IRefEnumerable #2503 If checking for MoveNext is costly then I have proposal on undefinable #521 that should also work on foreach
An example of a type you would not want to do an opt in of this optimization is System.Collections.Immutable.ImmutableList<T>
. Enumeration of the full list with foreach
is O(n)
but with a for
and using the indexer would be O(n*log(n))
. This example suggests you don't want to opt in on the IList<T>
or IReadonlyList<T>
interfaces either.
I think this is worth doing with some attribute mechanism IF it stands to simplify https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_ForEachStatement.cs
Another thing to note is that foreach
makes a dispose call that is skipped in the current for
indexable optimization.
Should/Would a type with this attribute require a GetEnumerator()
method still?
I think this is looking to optimize the wrong spot. The entire purpose of enumerators should be to provide the most optimal implementation of enumerating -- we shouldn't need to special-case something at the foreach
keyword.
Instead, we should look at how to ensure enumerators can be "free" in terms of abstraction overhead.
@scalablecory the optimization of using a for
instead of a foreach
is a special case micro optimization applicable to some simple types (namely T[]
, Span<T>
, string
and a few others and likely a Range type). It would likely be worth doing for these types even with a better enumeration pattern as free as possible from the abstraction overhead[1]. The issue is that currently a compiler implementation needs to be aware of this (growing) list of types in order to be aware of when the optimization applies. This means that without a well known type to annotate when the optimization should apply, the compiler version is tightly coupled to a BCL version if you wish to produce optimum code (when Range comes, a compiler that optimizes Range will be needed).
Whether or not coupling the language implementation to the library this way is an independent issue to reducing enumeration abstraction overhead given that it is not possible to eliminate the overhead in the set of cases where the optimization exists.
[1]: a for
loop with static bounds for example can be unrolled and JIT can omit any loop at all in some circumstances and avoid all branches to keep the cpu pipeline full, but most foreach
loops even with a set of static bounds when they inline still become some sort of state machine loop construct dealing with memory offsets of both the enumerable and the enumerator.
I think the tension is here:
Given that these special types can provide special struct-enumerators that fit the enumeration pattern, it's unfortunate that the runtime does not have enough smarts itself to see into that struct-enumerator impl and understand the relationship between things such that it can literally JIT that code into being just as fast as the index+length version.
That said, i can see why that level of optimization might be very hard. And, as such, while work can be made to lower the cost of the enumerator pattern to be almost free, it might not ever be possible to make it as-free-as the index+length version. In that world (which i'm guessing is the one we live in), it beneficial to the ecosystem to have a better way of still adding these optimizations in without the compiler having to be updated each and every time.
In that sense, this proposal is compromise that accepts that we may never get "good enough jitting", and in the meantime, gives a lower cost way to deal with that fact. The downside, of course, is that once you do this, there is almost zero motivation to improve jitting here, which then means other domains will not benefit whereas they might have if the JIT were to do this.
The proposal is interesting. However I wonder what are exactly the types this is expected to be used with?
Considering that it probably will not be used with List<T>
(compatibility), Range
(does not start from 0), Span<T>
, String
or arrays (already handled). - I am wondering whether it is applicable to enough cases to be justified.
Also I hear that enumerating List<T>
is a hard case for the JIT because of the version checking there and the try/finally
around the whole deal. Eliding trivial try/finally as solvable, if not already solved though.
I would not be surprised that in many cases involving struct enumerators (like the one for ImmutableArray
, which does not involve IDisposable or a version check) the JIT may already do nearly ideal translation to the index/length version.
It might be worth checking if optimizing helps enough there.
We made it a special case for [ReadOnly]Span<T>
because those types are expected to be iterated very often and on very hot paths.
Indexing and slicing facade is essentially the only purpose for these types, so they better do it well.
Otherwise I am not sure if types that justify similar optimization will come up very often. I could be wrong though. It may be worth collecting a list of candidates.
@VSadov
Also I hear that enumerating List
is a hard case for the JIT because of the version checking there
I'm not sure if it's been mentioned in this thread before, but don't forget about the reason that version checking is there in the first place. What if the list is modified in place during iteration, e.g. an item is deleted or inserted? The new behavior of List<T>
will no longer match up.
Wouldn't we want to do anything about iterator invalidation in this scenario? Of course if the type is immutable, no pre-cautions would be needed, otherwise we might get invalidated without notice.
Ok now that this'll never happen, is there something that we could do about arrays? extensions?
@alrz Arrays are already treated specially by the compiler (ie. they are optimized).
The LDM notes will contain more details, including ideas that might help in the future. At the moment we're recommending struct enumerators or for
loops (note that we have a foreach-to-for refactoring that makes it painless).
Arrays are already treated specially by the compiler
I mean with foreach ref
. I guess foreach is going to recognize an extension GetEnumerator
. so if the following works I think we don't have to do anything special about arrays.
public static StructRefEnumerator<T> GetEnumerator<T>(this T[] @this)
@alrz That's a good scenario to test. Added to test plan for foreach
with extensions (https://github.com/dotnet/roslyn/issues/30053).
The LDM looked at this early last month and decided that it doesn't rise to the level of deserving a language change.
A few types (such as arrays, strings and Span/ReadOnlySpan) are recognized as an optimized implementation for enumeration. We should generalize that. The proposal is to allow types marked with a certain attribute (
EnumerableAttribute
) to opt-in to such enumeration as well.The benefits are lightweight iteration and lightweight implementation (no need to define enumerator type to make your type enumerable).
LDM history: