[API proposal]: Add `GetSpeculativeForEachInfo`

DoctorKrolic commented 1 week ago

Background and Motivation

Conditions for a type to be foreachable are quite loose. This gives alot of opportunities like optimizing enumeration with struct enumerator etc. But at the same time it also means that in order to programmatically determine, whether the type can be used in foreach, what the iteration type would be and what methods are gonna be used under the hood, it isn't enough to just check for IEnumerable interface implementation, but rather a full foreach pattern match must be done, which is quite challenging, especially for await foreach, since matching valid GetAsyncEnumerator and MoveNextAsync methods formally require running an overload resolution algorythm. At the same time IDE and SGs sometimes need such "collection info" for their features to work. So I think, outsorcing this logic to the compiler, which alredy has it internally, is a reasonable request.

Proposed API

namespace Microsoft.CodeAnalysis
{
+    public readonly struct SpeculativeForEachInfo : IEquatable<SpeculativeForEachInfo>
+    {
+        public bool IsValid { get; }
+        public IMethodSymbol? GetEnumeratorMethod { get; }
+        public ITypeSymbol? EnumeratorType => GetEnumeratorMethod?.ReturnType; // This one just for convenience of use
+        public IMethodSymbol? MoveNextMethod { get; }
+        public IPropertySymbol? CurrentProperty { get; }
+        public IMethodSymbol? DisposeMethod { get; }
+        public ITypeSymbol? ElementType => CurrentProperty?.Type; // This one just for convenience of use
+        // The rest of standard methods for an equatable struct
+    }

     public class SemanticModel
     {
+        public SpeculativeForEachInfo GetSpeculativeForEachInfo(int position, ITypeSymbol type, bool isAsync = false);
     }
}

The shape of SpeculativeForEachInfo is inspired by existing ForEachStatementInfo structs. A few unrelated things (like conversions) are dropped, IsValid property is added (since type in question can be not foreachable at all). Left ElementType property and added EnumeratorType property just for convenience of use (I expect these 2 to be used commonly, e.g. to match an expected element type or to check whether a collection has a struct enumerator). isAsync determines whether we are looking for a normal foreach or an await foreach. Since enumerator patterns for both are nearly identical, I don't see a reason to create separate overloads returning separate data structs for them. This also matches existing ForEachStatementInfo, which just has IsAsynchronous property and used for both foreach and await foreach cases.

Usage Examples

Instead of code snippets I'll list here specific features/scenarios, which can benifit from these new APIs:

foreach semantic snippet: currently uses simple interface check to determine, whether a type is valid for foreaching, thus isn't shown for common Span and ReadOnlySpan collections. Moreover, by using ElementType during snippet construction we can make it obey var preferences and use explicit iteration type syntax if desired by the user (currently always uses var)
Analyzers, related to collection types
CodeFixes, which may decide to appear if some type is a valid collection or use that information in the fix itself. An example of such fix is this feature request

Alternative Designs

Don't add these APIs and let users recreate internal compiler logic with various amount of precision
Instead of IsValid property of SpeculativeForEachInfo make new APIs return nullable SpeculativeForEachInfo?. In this case all symbol properties of SpeculativeForEachInfo except DisposeMethod become non-nullable

Risks

333fred commented 5 days ago

This isn't something that can be done in a context-free manner. Extension-based foreach binding exists, and that needs to know what context it is being used in. At the very least, this API would need to have a position, and be defined on SemanticModel. It also seems to me that you could just create the syntax and analyze it with a speculative semantic model or forked compilation.

333fred commented 5 days ago

Also cc @AlekseyTs as co area-owner.

DoctorKrolic commented 5 days ago

This isn't something that can be done in a context-free manner

Removed the context-free API from the proposal

It also seems to me that you could just create the syntax and analyze it with a speculative semantic model or forked compilation

As I said in Discord, this indeed seems to be possible, but seems more like a workaround rather than an actual solution. Because in order to do this I need:

Create some dummy foreach statement like

foreach (var __v in default(TypeInQuestion)) { }

Get a speculative semantic model for that statement
GetForEachStatementInfo on that dummy foreach node
And only now I can extract required info for me

The process is convoluted and wasteful (at least we create a dummy node that we throw away immideatelly after we are done with our analysis). At the same time we already have things like GetSpeculativeTypeInfo, which just takes a syntax + position and does the analysis without requiring getting a separate speculative model. Moreover, in our case the syntax is not even needed, the proposed method basically asks a compiler: "if I am about to enumerate a variable of this type in a foreach loop at the given position, what properties would that loop have?"

333fred commented 5 days ago

GetSpeculativeTypeInfo is an API that heavily used by IDE. We could add many different GetSpeculativeX methods for various parts of binding; we don't do that, because we don't want to clutter the API. There's a perfectly reasonable approach to getting this information, and very low ask for this type of API, so the more general approach is what I'd default to.

AlekseyTs commented 5 days ago

Is there a specific reason why using ForEachStatementInfo as the result type wouldn't work?

CyrusNajmabadi commented 4 days ago

The process is convoluted and wasteful

We speculate on code all the time, without needing to have specialized entry points for each case.

It honestly doesn't feel convoluted to me. You're basically asking: I want to change the code to something akin to "... This ...", and I want to figure out if that's ok.

It's also not really wasteful. The idea here is reusing the vast majority of binding/symbol information (the entire top level compiler info). You then query for the item of interest in the new sub body. This then only pulls on what is needed to answer that question. That should be very minimal work. Akin to what the compiler would need to do with this newly proposed API.

dotnet / roslyn