Proposal: IncrementalValuesProvider<TSource>.Any() extension

Sergio0694 commented 2 years ago

Opening this following a conversation with @sharwell on the C# Discord server.

Background and Motivation

In some scenarios it might be useful to be able to determine whether a given IncrementalValuesProvider<TSource> sequence will produce any results, in order to conditionally generate some code or go down some other incremental pipeline branch. For instance, we're using this in several places in the MVVM Toolkit to decide whether we should generate a header file with a partial declaration of a class and some annotations on it (as those attributes should only be used once, and other partial declarations of that class can't know whether they're "the first one"). To do so, we're doing this:

IncrementalValueProvider<bool> isHeaderFileNeeded =
    validationInfo
    .Collect()
    .Select(static (item, _) => item.Length > 0);

Here's the code in the original context in the MVVM Toolkit:

And then register a source output on that, that generates code if the input is true. This seems pretty inefficient though, and I'd expect a proper Any() API to instead give the compiler much more flexibility to optimize the internal execution. For instance, all previous Select projections for that branch can be ignored (as they don't affect the number of items that will be produced), and similarly there's no need to evaluate all items in the previous node, allocate the array to pass to the next node, etc. All that is needed is a quick way to just check whether the node will produce at least one result, that's it. Having a proper API would also make this easier to use.

Proposed API

 namespace Microsoft.CodeAnalysis
 {
     public static class IncrementalValueProviderExtensions
     {
+        public static IncrementalValueProvider<bool> Any<TSource>(this IncrementalValuesProvider<TSource> source);
     }
 }

Usage Examples

IncrementalValueProvider<bool> isHeaderFileNeeded = validationInfo.Any();

// Register source with this node...

Alternative Designs

An alternative of Collect().Select(static (item, _) => item.Length > 0) can be used, but it's not intuitive and also not efficient (and it also scales with the size of the collection, whereas an Any() extension would just be O(1) regardless of size).

Risks

None that I can see.

CyrusNajmabadi commented 2 years ago

For instance, all previous Select projections for that branch can be ignored (as they don't affect the number of items that will be produced)

This concept is vaguely unsettling to me. For example, what if the Select projection throws? I don't think ignoring is a good idea here as that might be observable.

I believe Linq ran into this with things like x.Select(...).Count(), though i could be mistaken.

This seems pretty inefficient though

Do you have data on this though? I would be curious if it truly matters in a fully incremental scenario. Would be very interesting to see!

Sergio0694 commented 2 years ago

"This concept is vaguely unsettling to me. For example, what if the Select projection throws? I don't think ignoring is a good idea here as that might be observable."

Even without ignoring that, this API could still only ever invoke a single one for the first item, if it exists, instead of executing it for all available items in the previous node and also allocating the array containing all of them to pass to the next node. Seems like a good optimization even including that to me? 🤔

"Do you have data on this though? I would be curious if it truly matters in a fully incremental scenario. Would be very interesting to see!"

How would I go about gathering data for this? As in, how do I profile how long it takes to execute a given sequence of steps in an incremental pipeline? I'm a bit out of my depths here that I can't just roll up the usual benchmark with BenchmarkDotNet 😅

dotnet / roslyn