dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.13k stars 4.7k forks source link

[API Proposal]: PLINQ: add `ParallelEnumerable` extensions from `Enumerable` #98689

Open timcassell opened 8 months ago

timcassell commented 8 months ago

Background and motivation

The documentation states "PLINQ implements the full set of LINQ standard query operators as extension methods for the System.Linq namespace and has additional operators for parallel operations.", but some new extensions were added to System.Linq.Enumerable that are missing in System.Linq.ParallelEnumerable. I propose they be added so that further queries will continue to be parallelizable (even if some extensions can't be parallelized, they should still return ParallelQuery<T>).

API Proposal

namespace System.Linq

public static partial class ParallelEnumerable
{
    // Append
    // Chunk
    // DistinctBy
    // ExceptBy
    // IntersectBy
    // MaxBy
    // MinBy
    // Order
    // OrderDescending
    // Prepend
    // SkipLast
    // TakeLast
    // ToHashSet
    // TryGetNonEnumeratedCount
    // UnionBy
}

All the missing extensions from System.Linq.Enumerable, with the same shape (I didn't bother to write out the full declarations for brevity).

And the new .Net 9 extensions:

namespace System.Linq

public static partial class ParallelEnumerable
{
    // AggregateBy
    // CountBy
    // Index
}

API Usage

Enumerable.Range(0, 100)
    .AsParallel()
    .Append(200)
    .TakeLast(20)
    .MinBy(x => x)

Alternative Designs

Use .AsSequential() before using the new extensions.

Risks

New extensions may not run in parallel (but that's already called out in PLINQ documentation).

ghost commented 8 months ago

Tagging subscribers to this area: @dotnet/area-system-linq See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation The [documentation](https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/introduction-to-plinq) states `PLINQ implements the full set of LINQ standard query operators as extension methods for the [System.Linq](https://learn.microsoft.com/en-us/dotnet/api/system.linq) namespace and has additional operators for parallel operations.`, but some new extensions were added to `System.Linq.Enumerable` that are missing in `System.Linq.ParallelEnumerable`. I propose they be added so that we don't need to use `.AsSequential` to use the new extensions. ### API Proposal ```cs namespace System.Linq public static partial class ParallelEnumerable { // Append // Chunk // DistinctBy // ExceptBy // IntersectBy // MaxBy // MinBy // Order // OrderDescending // Prepend // SkipLast // TakeLast // ToHashSet // TryGetNonEnumeratedCount // UnionBy } ``` All the missing extensions from `System.Linq.Enumerable`, with the same shape (I didn't bother to write out the full declarations for brevity). And the new .Net 9 extensions: ```cs namespace System.Linq public static partial class ParallelEnumerable { // AggregateBy // CountBy // Index } ``` ### API Usage ```cs Enumerable.Range(0, 100) .AsParallel() .Append(200) .TakeLast(20) .MinBy(x => x) ``` ### Alternative Designs Use `.AsSequential()` before using the new extensions. ### Risks New extensions may not run in parallel (but that's already called out in PLINQ documentation).
Author: timcassell
Assignees: -
Labels: `api-suggestion`, `area-System.Linq`, `untriaged`
Milestone: -
ghost commented 8 months ago

Tagging subscribers to this area: @dotnet/area-system-linq-parallel See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation The [documentation](https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/introduction-to-plinq) states "PLINQ implements the full set of LINQ standard query operators as extension methods for the [System.Linq](https://learn.microsoft.com/en-us/dotnet/api/system.linq) namespace and has additional operators for parallel operations.", but some new extensions were added to `System.Linq.Enumerable` that are missing in `System.Linq.ParallelEnumerable`. I propose they be added so that we don't need to use `.AsSequential` to use the new extensions. ### API Proposal ```cs namespace System.Linq public static partial class ParallelEnumerable { // Append // Chunk // DistinctBy // ExceptBy // IntersectBy // MaxBy // MinBy // Order // OrderDescending // Prepend // SkipLast // TakeLast // ToHashSet // TryGetNonEnumeratedCount // UnionBy } ``` All the missing extensions from `System.Linq.Enumerable`, with the same shape (I didn't bother to write out the full declarations for brevity). And the new .Net 9 extensions: ```cs namespace System.Linq public static partial class ParallelEnumerable { // AggregateBy // CountBy // Index } ``` ### API Usage ```cs Enumerable.Range(0, 100) .AsParallel() .Append(200) .TakeLast(20) .MinBy(x => x) ``` ### Alternative Designs Use `.AsSequential()` before using the new extensions. ### Risks New extensions may not run in parallel (but that's already called out in PLINQ documentation).
Author: timcassell
Assignees: -
Labels: `api-suggestion`, `area-System.Linq`, `area-System.Linq.Parallel`, `untriaged`
Milestone: -
eiriktsarpalis commented 8 months ago

Some of the methods that you listed such as Append, Prepend, Chunk, ToHashSet and TryGetNonEnumeratedCount are inherently non-parallelizable. I'm guessing they wouldn't be parallel as such, only accelerator methods mapping ParallelEnumerable values to the sequential implementations?

timcassell commented 8 months ago

Some of the methods that you listed such as Append, Prepend, Chunk, ToHashSet and TryGetNonEnumeratedCount are inherently non-parallelizable. I'm guessing they wouldn't be parallel as such, only accelerator methods mapping ParallelEnumerable values to the sequential implementations?

Exactly.

Also, I'm sure I missed some new overloads for existing extensions (like ElementAt with a System.Index parameter). Basically I think the API surface should match like the intro-to-plinq documentation states, even though some can't actually be parallelized.