dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.05k stars 4.69k forks source link

[API Proposal]: Integrate Reactive's System.Linq.Async APIs into the Base Class Libraries #79782

Open wsugarman opened 1 year ago

wsugarman commented 1 year ago

Background and motivation

As IAsyncEnumerable<T> increases its presence in the BCLs and user libraries, and as new language constructs are added to support asynchronous streams, I think libraries like System.Linq.Async will become more crucial to their adoption by developers. Personally, as I work with our own Azure SDKs, I find myself quickly importing the package to easily aggregate paginated results using ToListAsync(CancellationToken) without a loop while Azure table queries can be easily aggregated locally using GroupBy.

The LINQ APIs have become incredibly popular in .NET, and I think the lack of LINQ support for asynchronous streams is becoming more apparent. I am especially worried that until recently the entire Reactive project itself was at risk of shutting down without active maintainers! The asynchronous LINQ APIs are far too important to leave outside of the BCLs where their support may disappear.

Why should the System.Linq namespace only include support for IEnumerable<T> when System.Collections.Generic includes IAsyncEnumerable<T>?

API Proposal

Functionally equivalent LINQ APIs (as detailed in this older MSDN documentation) used by the asynchronous equivalents of IEnumerable<T>, IOrderedEnumerable<TElement>, and IGrouping<TKey, TElement>. The current library exists here.

This may also be a good opportunity to re-evaluate these APIs if desired.

E.g.

namespace System.Linq;

public static partial class AsyncEnumerable
{
    public static IAsyncEnumerable<TResult> Select<TSource, TResult>(this IAsyncEnumerable<TSource> source, Func<TSource, TResult> selector);
}

API Usage

using System.Collections.Generic;
using System.Linq;

IAsyncEnumerable<Element> elements = /* ... */;
List<string> ids = await elements
    .Where(x => x.IsValid)
    .Select(x => x.Id)
    .Take(10)
    .OrderBy(x => x.Id)
    .ToListAsync(cancellationToken);

Alternative Designs

Alternatively, these could be kept out-of-band.

Risks

While this may include a large number of APIs, they will be quite familiar to those who use LINQ today on their synchronous counterparts.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-linq See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation As `IAsyncEnumerable` increases its presence in the BCLs and user libraries, and as new language constructs are added to support asynchronous streams, I think libraries like [System.Linq.Async](https://www.nuget.org/packages/System.Linq.Async) will become more crucial to their adoption by developers. Personally, as I work with our own [Azure SDKs](https://github.com/Azure/azure-sdk-for-net), I find myself quickly importing the package to easily aggregate paginated results using `ToListAsync(CancellationToken)` without a loop while Azure table queries can be easily aggregated locally using `GroupBy`. The LINQ APIs have become incredibly popular in .NET, and I think the lack of LINQ support for asynchronous streams is becoming more apparent. I am especially worried that until recently the entire Reactive project itself was [at risk of shutting down without active maintainers](https://github.com/dotnet/reactive/issues/1822)! The asynchronous LINQ APIs are far too important to leave outside of the BCLs where their support may disappear. Why should the `System.Linq` namespace only include support for `IEnumerable` when `System.Collections.Generic` includes `IAsyncEnumerable`? ### API Proposal Functionally equivalent LINQ APIs (as detailed in [this older MSDN documentation](https://learn.microsoft.com/en-us/previous-versions/dotnet/reactive-extensions/hh242983(v=vs.103))) used by the asynchronous equivalents of `IEnumerable`, `IOrderedEnumerable`, and `IGrouping`. The current library exists [here](https://github.com/dotnet/reactive/tree/main/Ix.NET/Source/System.Linq.Async). This may also be a good opportunity to re-evaluate these APIs if desired. E.g. ```csharp namespace System.Linq; public static partial class AsyncEnumerable { public static IAsyncEnumerable Select(this IAsyncEnumerable source, Func selector); } ``` ### API Usage ```csharp using System.Collections.Generic; using System.Linq; IAsyncEnumerable elements = /* ... */; List ids = await elements .Where(x => x.IsValid) .Select(x => x.Id) .Take(10) .OrderBy(x => x.Id) .ToListAsync(cancellationToken); ``` ### Alternative Designs Alternatively, these could be kept out-of-band. ### Risks While this may include a large number of APIs, they will be quite familiar to those who use LINQ today on their synchronous counterparts.
Author: wsugarman
Assignees: -
Labels: `api-suggestion`, `area-System.Linq`
Milestone: -
jkotas commented 1 year ago

Duplicate of https://github.com/dotnet/runtime/issues/31580

wsugarman commented 1 year ago

While this was indeed discussed previously, in the time since #31580 it has been discovered that there were no active maintainers for Reactive until volunteers came forward this October. And there haven't been any commits since April!

I think it's worth another discussion of whether this API (or a subset) should be deemed "core" enough to be included in the BCLs.

eiriktsarpalis commented 1 year ago

While this was indeed discussed previously, in the time since https://github.com/dotnet/runtime/issues/31580 it has been discovered that there were no active maintainers for Reactive until volunteers came forward this October. And there haven't been any commits since April!

I'm curious, why would moving the library to the shared framework solve the problem of maintenance? If anything, it would make the components more difficult to evolve, since it would prevent introducing breaking changes or versioning it independently of dotnet. There are clear benefits to keeping some components out of the shared framework.

wsugarman commented 1 year ago

I'm curious, why would moving the library to the shared framework solve the problem of maintenance? If anything, it would make the components more difficult to evolve, since it would prevent introducing breaking changes or versioning it independently of dotnet. There are clear benefits to keeping some components out of the shared framework.

I am not (yet) concerned about the speed of changes in the shared framework. Instead, I am more concerned with making sure someone is responsible for maintaining these (what I think are) critical APIs. While it seems that some dotnet foundation members have charitably helped the project in the past, it now appears that the project is entirely dependent on the kindness of unpaid volunteers.

It's wonderful that developers have come forward to help, and for many OSS projects this model is sufficient, but I wonder if these APIs are too important to leave outside the purview of Microsoft. What happens when the current developers move onto other projects? Will the community continue to be so lucky that new volunteers will step forward? It seems that the recent questions of ownership have only arisen due to enough GitHub issues.

Instead, I think the core System.Linq.Async APIs would benefit greatly from being included in the shared library for the following reasons:

  1. A team of developers at Microsoft would be responsible for their ongoing development, maintenance, and quality, ensuring that they continue to grow as both .NET and async streams evolve
  2. Developers can easily discover and use the async stream LINQ APIs without the use of an external package
    • Today, even performing a simple projection from an IAsyncEnumerable<T> requires a NuGet package, but why should that be the case if it can be done for IEnumerable<T>? I think we'd be hard pressed to explain why there are language constructs for IEnumerable<T> and IAsyncEnumerable<T> like foreach and await foreach, but the LINQ APIs are only included for IEnumerable<T>.
  3. A whole new set of features could be possible, such as adding support for asynchronous LINQ queries in a future dotnet SDK!
eiriktsarpalis commented 1 year ago

I am not (yet) concerned about the speed of changes in the shared framework.

It's not an issue of speed, it is an issue of possibility. Most types of source and binary breaking changes are not allowed for shared framework components, ever.

A team of developers at Microsoft would be responsible for their ongoing development, maintenance, and quality, ensuring that they continue to grow as both .NET and async streams evolve

Bare minimum, it would guarantee security fixes but it's not necessarily the case that we'd be able to fund continued evolution of the library.

There's also the issue of code size, we've rejected API proposals in regular Linq in the past out of size concerns.

wsugarman commented 1 year ago

It's not an issue of speed, it is an issue of possibility. Most types of source and binary breaking changes are not allowed for shared framework components, ever.

@eiriktsarpalis I'll defer to you as one of the maintainers of the System.Linq namespace. Do you feel ever hindered by being included in the shared framework? Based on your other comments, has your team considered an out-of-band package (or perhaps an extension package) for new LINQ APIs to help manage the size?

My biggest concern here is that there is (what I believe to be) a critical library in System.Linq.Async. I believe async streams' usability depends on the presence of this 3rd party library, and without some sort of formal funding from the dotnet org/Microsoft its future remains too uncertain.

As argued in this issue, I think its most logical home -- ignoring the history of how the library came to be and perhaps the burdens of being in the shared library -- is within the BCLs; why should the treatment of the LINQ APIs differ between IEnumerable<T> and IAsyncEnumerable<T>? If it's just a matter of their development history, should that be reconciled? However, if it is believed that the cost of integrating this library into the BCL is too high, then I'm certainly open to alternatives for a more formal investment of engineering resources into this library. Although, it sounds like there should also be a separate discussion around how future LINQ APIs should be added given their burdensome size.

Perhaps most importantly, as the new owner of the library @HowardvanRooijen, would you prefer more direct investment by the dotnet team in the System.Linq.Async library? If so, would you have a preference as to how?

HowardvanRooijen commented 1 year ago

FYI first alpha release of System.Reactive.Async is available on NuGet: https://www.nuget.org/packages/System.Reactive.Async/6.0.0-alpha.3

WeihanLi commented 3 months ago

As there are more and more IAsyncEnumerable APIs in BCL, it makes me feel the IAsyncEnumerable in BCL is not completed without System.Linq.Async

stephentoub commented 3 months ago

As there are more and more IAsyncEnumerable APIs in BCL, it makes me feel the IAsyncEnumerable in BCL is not completed without System.Linq.Async

What does it mean to be "in BCL"? We ship many core libraries as nuget packages... if we shipped a System.Linq.Async as a nuget package, how is that different from the status quo?

Arithmomaniac commented 3 months ago

What does it mean to be "in BCL"? We ship many core libraries as nuget packages... if we shipped a System.Linq.Async as a nuget package, how is that different from the status quo?

The library as-is is functional, but:

Compare Linq Chunk to Interactive.Async Buffer (putting aside that it's not even in System.Linq.Async, and progress on changing that is stuck).

julealgon commented 3 months ago

As there are more and more IAsyncEnumerable APIs in BCL, it makes me feel the IAsyncEnumerable in BCL is not completed without System.Linq.Async

What does it mean to be "in BCL"? We ship many core libraries as nuget packages... if we shipped a System.Linq.Async as a nuget package, how is that different from the status quo?

@stephentoub maybe he is talking about discoverability? Many devs are not aware of the System.Linq.Async package. Perhaps it should be made a default package in worker and web SDKs at least?

Arithmomaniac commented 3 months ago

I just noticed this exact conversation is also being had at https://github.com/dotnet/reactive/pull/2102; maybe we should redirect further discussion of this to there.

WeihanLi commented 3 months ago

As mentioned by @julealgon, some of us even don't know the System.Linq.Async library, we could not get IntelliSense without adding a package reference for the library.

As @Arithmomaniac mentions, the library may be missing some performance improvements, love to see each performance improvement by @stephentoub and the annual performance improvements blog(book), when it's outside of BCL, we may miss the performance improvements and refactorings that may be also beneficial for System.Linq.Async, and there's been no new release for the last two years and more.

The LINQ in BCL provides synchronous methods like ToArray, Sum, and Any(and more...), but it lacks asynchronous methods like ToArrayAsync, SumAsync, and AnyAsync(and more...). This could be a limitation when working with IAsyncEnumerable, and we're adding more linq methods likes Chuck/CountBy/... for IEnumerable in BCL, but not investing for IAsyncEnumerable async methods, it may be not fair.

We're using the System.Linq.Async library actually, and each time I had to add the reference, it made me feel incomplete more, just like IEnumerable gives us bullets and guns, while IAsyncEnumerable gives us bullets only

viceroypenguin commented 3 months ago

@WeihanLi @julealgon fyi, in the interim, some of these updates are added and maintained in the SuperLinq.Async library.

wsugarman commented 3 months ago

As there are more and more IAsyncEnumerable APIs in BCL, it makes me feel the IAsyncEnumerable in BCL is not completed without System.Linq.Async

What does it mean to be "in BCL"? We ship many core libraries as nuget packages... if we shipped a System.Linq.Async as a nuget package, how is that different from the status quo?

@stephentoub - I'm catching up on this thread after some time lol. At least from my original post, I agree that inclusion into the BCLs isn't necessarily the goal. From my perspective, and to quote an earlier post, I think the .NET ecosystem would benefit from DevDiv/a team of developers at Microsoft overseeing the ongoing development, maintenance, and quality of the library to help ensure it continues to flourish as both .NET and async streams evolve (in addition to the other reasons mentioned in this thread). For example, perhaps ensuring new versions of the library are published with each new version of .NET (like the extension libraries).

Furthermore, I think there are some interesting features/parity that could still be done with IAsyncEnumerable<T> like the inclusion of LINQ syntax. If that requires inclusion into the BCLs for such a feature, that may be evidence. But I also understand that these libraries were first written some time ago, and the considerations for inclusion into the BCLs have changed.

Arithmomaniac commented 3 months ago

I agree that inclusion into the BCLs isn't necessarily the goal [...] I think the .NET ecosystem would benefit from DevDiv/a team of developers at Microsoft overseeing the ongoing development, maintenance, and quality of the library to help ensure it continues to flourish as both .NET and async streams evolve [...]. For example, perhaps ensuring new versions of the library are published with each new version of .NET (like the extension libraries).

Agreed. I think "BCL" is being used loosely by those requesting this feature; what is really wanted is first-class development and support.

eiriktsarpalis commented 3 months ago

I think the .NET ecosystem would benefit from DevDiv/a team of developers at Microsoft overseeing the ongoing development, maintenance, and quality of the library to help ensure it continues to flourish as both .NET and async streams evolve

I don't think it's particularly beneficial to consider Microsoft or DevDiv as being the sole arbiters of good libraries. Even if we wanted to, It wouldn't be a realistic goal given that our resources are finite. I probably speak for the rest of the team in saying that we have strong belief in the wider community's ability to deliver libraries that match or even exceed our own bar for quality.

HowardvanRooijen commented 3 months ago

So the gist behind https://github.com/dotnet/reactive/pull/2102 is that while we're happy to support Reactive Extensions for .NET (and by we, I mean @endjin) - but we're a small < 15 people organisation, and we have to generate our own revenue (none of which is related to Rx BTW). In the last year we've dedicated 868 hours (108 days) of effort into maintaining Rx .NET - we don't have the capacity to support Interactive Extensions (incl System.Linq.Async) too. Consumers of the library need to step up and own their software supply chain.

(As an aside - bundling the two projects into the single repo only makes sense because the original PG developed both libraries - it's added friction for Rx because potential contributors look at the repo and get put off by how complicated it is. )

While I agree that the .NET PG may not be the sole arbiters of good quality libraries... these did originate from them, and as per https://github.com/dotnet/reactive/pull/2102 @stephentoub has some strong opinions that the design guidelines have evolved in the 14 years since these libraries were originally open sourced (on CodePlex!) - but I'm not sure we, as external folk, have all that knowledge, or enough information to align to those guidelines or design ambitions.

idg10 commented 3 months ago

To follow up on my colleague @HowardvanRooijen 's comment, I also work at endjin and am doing most of the maintenance work on Rx.NET. We've had some sporadic conversations with @davidfowl and @stephentoub about the future of System.Async.Linq.

One upshot of this that @stephentoub pointed out that the library in its current form wouldn't pass muster as an official .NET runtime library, because it does not align with current guidelines:

there were a bunch of choices made in the APIs exposed here that we would not want to bring in, e.g. an AggregateAsync method is fine, but not AggregateAwaitAsync or AggregateAwaitWithCancellationAsync... that's simply not how we'd choose to expose such functionality. We would also be able to design the new methods taking into account features that didn't previously exist, e.g. generic math

Something that I suspect would also be an issue is the efficiency of the code. It was all written quite a long time ago. Today, .NET runtime library code is expected to be much more frugal with memory allocation than was common back then. And although we've not done any profiling to see how well today's System.Linq.Async stacks up against the non-async equivalents in the .NET runtime libraries, I know that some parts of System.Reactive cause allocations at a rate that looks alarmingly high by modern standards, so I would expect the same to be true of System.Linq.Async.

So there are at least two substantial jobs to be done for System.Linq.Async:

  1. modify the API design to align with current .NET runtime library guidelines
  2. bring the performance into line with modern expectations

Since, as Howard points out, a) nobody pays us to do any of this work and b) we only ever really wanted to maintain System.Reactive and we effectively acquired System.Interactive and System.Linq.Async as a side effect of this, we don't really have the capacity to do this.

If somebody wanted it enough to pay for it that might be a different matter, but open source projects rarely seem to work that way.

We would much prefer not to be in charge of System.Linq.Async, but for that to happen, someone would need to take ownership. The obvious candidate would be the same bit of Microsoft that is already in charge of System.Linq, but that's not something we get to decide.

TomGathercole commented 3 months ago

As there are more and more IAsyncEnumerable APIs in BCL, it makes me feel the IAsyncEnumerable in BCL is not completed without System.Linq.Async

What does it mean to be "in BCL"? We ship many core libraries as nuget packages... if we shipped a System.Linq.Async as a nuget package, how is that different from the status quo?

I'm personally having some grief at the moment due to needing to support queryables coming from both entity framework core and from a custom async-compatible IQueryable provider (Since entity framework did not adopt the IAsyncQueryable interface from System.Linq.Async.Queryable).

I realize that async queryables are a somewhat separate concerns but, in my view, I think this is a symptom of the lack of full built-in async enumerable/queryable support. I.e. if entity framework is unwilling to take a dependency on System.Linq.Async, then other library authors will feel the same and the ecosystem will end up being a little fractured.

Again, I know that queryables aren't explicitly in scope for this discussion, but I think it illustrates a problem caused by important functionality not existing in the framework (and, from a purely selfish perspective, I think this is probably a nice step towards easily supporting async queryables from multiple sources)