dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.33k stars 4.74k forks source link

[API Proposal]: Add Then method to Task-like types #76839

Closed jmarolf closed 2 years ago

jmarolf commented 2 years ago

API Proposal

namespace System.Threading.Tasks
{
    public class Task<TResult> : Task
    {
+       public Task<TNewResult> Then<TNewResult>(Func<TResult, TNewResult> selector);
+       public Task<TNewResult> Then<TNewResult, TArg>(Func<TResult, TNewResult> selector, TArg);
    }

    public readonly struct ValueTask<TResult> : IEquatable<ValueTask<TResult>>
    {
+       public ValueTask<TNewResult> Then<TNewResult>(Func<TResult, TNewResult> selector);
+       public ValueTask<TNewResult> Then<TNewResult, TArg>(Func<TResult, TNewResult> selector, TArg);
    }
}

The functions given to these Then methods are only called if Task.IsCompletedSuccessfully would be true. Otherwise, a Task with the same cancellation state and exception information is returned.

API Usage

async Task M(){
    string text = await File.ReadAllTextAsync(path).Then(static text => text.ReplaceLineEndings());
}

Task<string> M(){
    return File.ReadAllTextAsync(path).Then(static text => text.ReplaceLineEndings());
}

Background and motivation

Method chaining is a very common way to manipulate data.

void M(){
    string text = File.ReadAllText(path).ReplaceLineEndings();
}

and today expressions are one of the main ways in which data gets manipulated in .NET with C# continuing to make it easier to "expression-ify" things where it makes sense to do so.

Unfortunately, once async comes into the picture you need to wrap things in parenthesis in order for the associativity to keep working the way the developer wants and we need to be in an async method.

async Task M(){ // need to be async method
    string text = (await File.ReadAllTextAsync(path)).ReplaceLineEndings(); // must wrap all awaits in parenthesis or have on separate line
}

You can try and solve some of this with the ContinueWith methods but they have some usability issues.

Lets say we wanted to write our own extension method with a signature like this

public static Task<TNewResult> Then<TResult, TNewResult>(this Task<TResult> task, Func<TResult, TNewResult> selector)

a naïve implementation may look like this

public static Task<TNewResult> Then<TResult, TNewResult>(this Task<TResult> task, Func<TResult, TNewResult> selector)
{
    return task.ContinueWith(ContinuationFunction, state: (object?)selector);
    static TNewResult ContinuationFunction(Task<TResult> task, object? state)
    {
        if (task.IsCompletedSuccessfully && state is not null)
        {
            var selector = (Func<TResult, TNewResult>)state;
            return selector(task.Result);
        }

        // this helper needs to account for canceled tasks and exceptions
        throw new InvalidOperationException();
    }
}

A few problems with this:

1. It needs to handle cancellation and exceptions

This is not impossible to do just tricky. It is unfortunate that this is necessary. In a world where folks are awaiting Tasks, cancellation and exceptions are going to be thrown at the call site of the await. Handling these cases shouldn't be required and it would be nice to have a method that is only called if the task completes successfully instead of forcing this concern on developers using the ContinueWith api.

2. You cannot pass in args without boxing

Since ContinueWith only excepts object? for its state type you cannot do something like this without boxing:

namespace MyNamsspace
{
    public enum LineEndingKind
    {
        CrLf,
        Lf
    }

    public class Program
    {
        public static Task<string> M(string path, LineEndingKind lineEndingKind)
        {
            // there exists some overload of ReplaceLineEndings that accepts a LineEndingKind 
            return File.ReadAllTextAsync(path).Then(static (text, kind) => text.ReplaceLineEndings(kind), lineEndingKind);
        }
    }
}

namespace System.Threading.Tasks
{
    public static class TaskExtensions
    {
        public static Task<TNewResult> Then<TResult, TArg, TNewResult>(this Task<TResult> task, Func<TResult, TArg, TNewResult> selector, TArg arg)
        {
            return task.ContinueWith(ContinuationFunction, state: (object?)(selector, arg)); // if arg is a struct we are boxing it here
            static TNewResult ContinuationFunction(Task<TResult> task, object? state)
            {
                if (task.IsCompletedSuccessfully && state is not null)
                {
                    var (selector, arg) = ((Func<TResult, TArg, TNewResult>, TArg))state;
                    return selector(task.Result, arg);
                }

                // this helper needs to account for cancelled tasks and exceptions
                throw new InvalidOperationException();
            }
        }
    }
}

3. This approach cannot be taken on ValueTask

If you wanted to write a similar helper for ValueTask you could do

public static async ValueTask<TNewResult> Then<TResult, TNewResult>(this ValueTask<TResult> valueTask, Func<TResult, TNewResult> selector)
{
    return await valueTask.AsTask().Then(selector);
}

but this:

  1. always converts the ValueTask to a Task, negating the point of ValueTask in many cases
  2. awaits the Task, which may not be a safe thing to do (should it be configured? if so how?)

Alternative Designs

Name

Then is the name take by many other constructs of this type in other languages, but I could see someone arguing for Select.

namespace System.Threading.Tasks
{
    public class Task<TResult> : Task
    {
+       public Task<TNewResult> Select<TNewResult>(Func<TResult, TNewResult> selector);
+       public Task<TNewResult> Select<TNewResult, TArg>(Func<TResult, TNewResult> selector, TArg);
    }

    public readonly struct ValueTask<TResult> : IEquatable<ValueTask<TResult>>
    {
+       public ValueTask<TNewResult> Select<TNewResult>(Func<TResult, TNewResult> selector);
+       public ValueTask<TNewResult> Select<TNewResult, TArg>(Func<TResult, TNewResult> selector, TArg);
    }
}

As this has a similar "transform a deferred value" feel that linq has.

Interaction with enumeration

You could also imagine an extension method like this being considered:

public static IEnumerable<Task<TNewResult>> Then<TResult, TArg, TNewResult>(this IEnumerable<Task<TResult>> source, Func<TResult, TArg, TNewResult> selector, TArg arg)
{
    foreach (var item in source)
    {
        yield return item.Then(selector, arg);
    }
}

But once the core Then method exists writing this extension is trivial enough to get right that I do not think it is worth including.

One thing that would be hard to get right is something like this:

public static async IAsyncEnumerable<TNewResult> Then<TResult, TArg, TNewResult>(this IAsyncEnumerable<TResult> source, Func<TResult, TArg, TNewResult> selector, TArg arg)
{
    await foreach (var item in source) // awaiting may not be safe here
    {
        yield return selector(item, arg);
    }
}

since it would require us to await the move next which may not be a safe thing to do without configuring the task. However, at this point we are just talking about implementing the Select linq method for IAsyncEnumerable types. If we ever do that, we can decide on the correct way such and enumerable would be written taking ConfigureAwait into account. Seems safe to punt on this concern for now.

Risks

My assumption is that Task and ValueTask are some of the most used types in .NET so modifying them in any way carries a great risk. I think that the implementation here could just be extension methods on these types with internal access so they can be implemented efficiently but if they do actually need to exist on the types themselves there is the concern of increasing the size of these types at all as thay are so common it could decrease .NET performance by allocating more resources.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-threading-tasks See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation Method chaining is a very common way to manipulate data. ```csharp void M(){ string text = File.ReadAllText(path).ReplaceLineEndings(); } ``` and today expressions are one of the main ways in which data gets manipulated in .NET with C# continuing to make it easier to "expression-ify" things where it makes sense to do so. Unfortunately, once async comes into the picture you need to wrap things in parenthesis in order for the associativity to keep working the way the developer wants and we need to be in an async method. ```csharp async Task M(){ // need to be async method string text = (await File.ReadAllTextAsync(path)).ReplaceLineEndings(); // must wrap all awaits in parenthesis or have on separate line } ``` You can try and solve some of this with the `ContinueWith` methods but they have some usability issues. Lets say we wanted to write our own extension method with a signature like this ```csharp public static Task Then(this Task task, Func selector) ``` a naïve implementation may look like this ```csharp public static Task Then(this Task task, Func selector) { return task.ContinueWith(ContinuationFunction, state: (object?)selector); static TNewResult ContinuationFunction(Task task, object? state) { if (task.IsCompletedSuccessfully && state is not null) { var selector = (Func)state; return selector(task.Result); } // this helper needs to account for canceled tasks and exceptions throw new InvalidOperationException(); } } ``` A few problems with this: ### 1. It needs to handle cancellation and exceptions This is not impossible to do just tricky. It is unfortunate that this is necessary. In a world where folks are `await`ing Tasks, cancellation and exceptions are going to be thrown at the call site of the `await`. Handling these cases shouldn't be required and it would be nice to have a method that is only called if the task completes successfully instead of forcing this concern on developers using the `ContinueWith` api. ### 2. You cannot pass in args without boxing Since `ContinueWith` only excepts `object?` for its state type you cannot do something like this without boxing: ```csharp namespace MyNamsspace { public enum LineEndingKind { CrLf, Lf } public class Program { public static Task M(string path, LineEndingKind lineEndingKind) { // there exists some overload of ReplaceLineEndings that accepts a LineEndingKind return File.ReadAllTextAsync(path).Then(static (text, kind) => text.ReplaceLineEndings(kind), lineEndingKind); } } } namespace System.Threading.Tasks { public static class TaskExtensions { public static Task Then(this Task task, Func selector, TArg arg) { return task.ContinueWith(ContinuationFunction, state: (object?)(selector, arg)); // if arg is a struct we are boxing it here static TNewResult ContinuationFunction(Task task, object? state) { if (task.IsCompletedSuccessfully && state is not null) { var (selector, arg) = ((Func, TArg))state; return selector(task.Result, arg); } // this helper needs to account for cancelled tasks and exceptions throw new InvalidOperationException(); } } } } ``` ### 3. This approach cannot be taken on `ValueTask` If you wanted to write a similar helper for ValueTask you could do ```csharp public static async ValueTask Then(this ValueTask valueTask, Func selector) { return await valueTask.AsTask().Then(selector); } ``` but this: 1. always converts the `ValueTask` to a `Task`, negating the point of `ValueTask` in many cases 1. awaits the `Task`, which may not be a safe thing to do (should it be configured? if so how?) ### API Proposal ```diff namespace System.Threading.Tasks { public class Task : Task { + public Task Then(Func selector); + public Task Then(Func selector, TArg); } public readonly struct ValueTask : IEquatable> { + public ValueTask Then(Func selector); + public ValueTask Then(Func selector, TArg); } } ``` The functions given to these `Then` methods are only called if `Task.IsCompletedSuccessfully` would be true. Otherwise, a `Task` with the same cancellation state and exception information is returned. ### API Usage ```csharp async Task M(){ string text = await File.ReadAllTextAsync(path).Then(static text => text.ReplaceLineEndings()); } Task M(){ return File.ReadAllTextAsync(path).Then(static text => text.ReplaceLineEndings()); } ``` ### Alternative Designs ### Name `Then` is the name take by many other constructs of this type in other languages, but I could see someone arguing for `Select`. ```diff namespace System.Threading.Tasks { public class Task : Task { + public Task Select(Func selector); + public Task Select(Func selector, TArg); } public readonly struct ValueTask : IEquatable> { + public ValueTask Select(Func selector); + public ValueTask Select(Func selector, TArg); } } ``` As this has a similar "transform a deferred value" feel that linq has. ### Interaction with enumeration You could also imagine an extension method like this being considered: ```csharp public static IEnumerable> Then(this IEnumerable> source, Func selector, TArg arg) { foreach (var item in source) { yield return item.Then(selector, arg); } } ``` But once the core `Then` method exists writing this extension is trivial enough to get right that I do not think it is worth including. One thing that _would_ be hard to get right is something like this: ```csharp public static async IAsyncEnumerable Then(this IAsyncEnumerable source, Func selector, TArg arg) { await foreach (var item in source) // awaiting may not be safe here { yield return selector(item, arg); } } ``` since it would require us to `await` the move next which may not be a safe thing to do without configuring the task. However, at this point we are just talking about implementing the `Select` linq method for `IAsyncEnumerable` types. If we ever do that, we can decide on the correct way such and enumerable would be written taking `ConfigureAwait` into account. Seems safe to punt on this concern for now. ### Risks My assumption is that `Task` and `ValueTask` are some of the most used types in .NET so modifying them in any way carries a great risk. I think that the implementation here could just be extension methods on these types with `internal` access so they can be implemented efficiently but if they do actually need to exist on the types themselves there is the concern of increasing the size of these types _at all_ as thay are so common it could decrease .NET performance by allocating more resources.
Author: jmarolf
Assignees: -
Labels: `api-suggestion`, `area-System.Threading.Tasks`
Milestone: -
joperezr commented 2 years ago

cc: @stephentoub

stephentoub commented 2 years ago

This is a duplicate of https://github.com/dotnet/runtime/issues/58692.

Aren't these trivially implemented with await, e.g.

public static async Task<TNewResult> Then<TResult, TNewResult>(this Task<TResult> task, Func<TResult, TNewResult> selector) =>
    selector(await task);

?

jmarolf commented 2 years ago

If we assume we have these implementations:

public static async Task<TNewResult> Then<TResult, TNewResult>(this Task<TResult> task, Func<TResult, TNewResult> selector) =>
    selector(await task);

public static async Task<TNewResult> Then<TResult, TNewResult>(this ConfiguredTaskAwaitable<TResult> task, Func<TResult, TNewResult> selector) =>
    selector(await task);

Then these are the behaviors we would observer, correct?

// unlikely to be true for this api but lets assume I deadlock if I do not ConfigureAwait(false) in this method
string text1 = (await File.ReadAllTextAsync(path).ConfigureAwait(false)).ReplaceLineEndings();
// not equivalent the expression that got us text1 if Then is "nakedly" awaiting the task
string text2 = await File.ReadAllTextAsync(path).Then(static text => text.ReplaceLineEndings()).ConfigureAwait(false);
// is equivalent to the expression for text1 but ConfigureAwait most be chained on all task return types.
string text3 = await File.ReadAllTextAsync(path).ConfigureAwait(false).Then(static text => text.ReplaceLineEndings()).ConfigureAwait(false);

awaiting a task "nakedly" is not a safe thing to do in many instances. We could say that is fine just append ConfigureAwait in all our your expressions though, in which case this proposal has no value.

stephentoub commented 2 years ago

We could say that is fine just append ConfigureAwait in all our your expressions though, in which case this proposal has no value.

A built-in API would have to choose whether to use ConfigureAwait or not; it's going to get it wrong for some subset of consumers. Someone writing the one-liners themselves can choose for themselves in what context they want to run the delegate.

jmarolf commented 2 years ago

If the implementation of Then just continues after the Task is run then this is not the case right? The selector does not actually need to await to run, it just needs the underlying task to have completed. This can be implemented with the existing ContinueWith apis today but with some drawbacks and inefficiencies that are not possible to work around.

stephentoub commented 2 years ago

The selector does not actually need to await to run, it just needs the underlying task to have completed.

await is just a mechanism for waiting for the task to have completed, as is ContinueWith, as is doing a blocking Wait on the task, etc.

If the implementation of Then just continues after the Task is run then this is not the case right?

It's the case regardless of implementation detail. Des the delegate need to run in the original SynchronizationContext / TaskScheduler or is it ok for it to be invoked wherever? ConfigureAwait is a mechanism to implement the answer to that question specifically for await. If the delegate needs to be run in the original context, then don't use ConfigureAwait (or use ConfigureAwait(true). If it doesn't matter where the delegate is run, then use ConfigureAwait(false).

Imagine this was in a WinForms app, and someone wrote Task.Delay(1000).Then(() => textBox.Text = "Timer fired")... then you want the delegate running back in the original sync context, and using ConfigureAwait(false) would be wrong. If it was your "lets assume I deadlock if I do not ConfigureAwait(false) in this method I deadlock" scenario, then not using ConfigureAwait(false) would be wrong.

If you're not using await and ConfigureAwait to implement these semantics, then you're using something else to implement the desired semantics... the desired semantics still need to be implemented :)

jmarolf commented 2 years ago

right, but (hopefully) you would only need to configure the task once and those semantics can be "passed along" to subsequent Then calls. re-configuring them each time seems unnecessarily verbose.

One of the core thesis that I assume here is that manipulating tasks in expressions is not a negative thing. Today is it very hard to do correctly so most people just have separate statements for each await. Then implies that we think code like this:

var service = await serviceLocator.GetRequiredServiceAsync<MyService>();
service = service.WithCustomSetting(setting);
var value = await service.TryGetValueAsync(request);

or this (of course no one is actually writing this code):

var value = (await (await serviceLocator.GetRequiredServiceAsync<MyService>())
                .WithCustomSetting(setting)
                .TryGetValueAsync(request));

could be written like this:

var value = await serviceLocator.GetRequiredServiceAsync<MyService>()
    .Then(static (service, setting) => service.WithCustomSetting(setting), setting)
    .Then(static (service, request) => service.TryGetValueAsync(request), request)
    .UnWrap();

With configuration for the schedule and context happening at the "main" task with Then calls just doing transformations on the final result without needing to also be configured. Certainly, if someone does want different synchronization contexts or schedulers at different points the evaluation of an expression then this doesn't buy them anything. In my opinion that is not the typical use case.

stephentoub commented 2 years ago

but (hopefully) you would only need to configure the task once and those semantics can be "passed along" to subsequent Then calls

No, there is no concept of "how should this task be awaited" baked into a Task or somehow propagated from task to task... there might not even be a task, you might be awaiting something that's not a task, the async state machine might itself not be a task, etc. The configuration is about how a given await is performed, not anything to do with the Task itself; it simply impacts how a continuation is hooked up. We would not make Task more expensive for everyone to carry along that additional information for Task; it would also be a significant breaking change if awaits / ContinueWith / etc. were to change their default behavior based on how something previously in the await chain was awaited. (Also, note that the proposed APIs would result in significantly more expensive execution than just using await multiple times in the same async method.)

Today is it very hard to do correctly so most people just have separate statements for each await.

If that's really an issue, then we should fix that in the language.

davidfowl commented 2 years ago

Or we could copy rust and make await a postfix like rust:

var service = serviceLocator.GetRequiredServiceAsync<MyService>().await.
service.WithCustomSetting(setting).await.service.TryGetValueAsync(request);

Burns my eyes but, solves the problem 😄

jmarolf commented 2 years ago

If that's really an issue, then we should fix that in the language.

I have a proposal out for that, and perhaps that will get implemented in the next 10+ years :) but it's always less expensive to implement as a library feature (if possible) so I thought I would do my due diligence and start a discussion here.

No, there is no concept of "how should this task be awaited" baked into a Task or somehow propagated from task to task

Excellent point. If I must be in a specific context there is no way an extension method could be implemented to do this. If passing this information along is simply impossible, then this isn't something that can be done in the BCL and I can give the feedback to the LDM.

jmarolf commented 2 years ago

@davidfowl because of how complex task configurations can be post fix awaits were already rejected by the LDM: https://github.com/dotnet/csharplang/issues/4076. It sounds like that same complexity makes doing anything at the library level equally difficult.

jmarolf commented 2 years ago

Imagine this was in a WinForms app, and someone wrote Task.Delay(1000).Then(() => textBox.Text = "Timer fired")... then you want the delegate running back in the original sync context, and using ConfigureAwait(false) would be wrong. If it was your "lets assume I deadlock if I do not ConfigureAwait(false) in this method I deadlock" scenario, then not using ConfigureAwait(false) would be wrong.

This is a good argument for not adding this api on Task and only Task<TResult> as I agree, I have no idea what a general Action is supposed to do for something like this.

For an example where task does return a value:

public partial class Form1 : Form
{
    public Form1()
    {
        InitializeComponent();
        _ = StartTask(TaskScheduler.FromCurrentSynchronizationContext());
    }

    private async Task StartTask(TaskScheduler scheduler)
    {
        var factory = new TaskFactory(scheduler);
        var text = await factory.StartNew(async () =>
        {
            await Task.Delay(1000).ConfigureAwait(false);
            return 42;
        }).Then(x =>
        {
            var newText = textBox1.Text = $"Timer fired";
            return newText;
        }).ConfigureAwait(true);
    }
}

You would need something like this:

public static Task<TNewResult> Then<TResult, TNewResult>(this Task<TResult> task, Func<TResult, TNewResult> selector)
{
    var scheduler = TaskScheduler.FromCurrentSynchronizationContext();
    return task.ContinueWith(ContinuationFunction, state: (object?)selector, scheduler);
    static TNewResult ContinuationFunction(Task<TResult> task, object? state)
    {
        if (task.IsCompletedSuccessfully && state is not null)
        {
            var selector = (Func<TResult, TNewResult>)state;
            return selector(task.Result);
        }

        // this helper needs to account for canceled tasks and exceptions
        throw new InvalidOperationException();
    }
}

For things to work correctly. I would still argue that it is technically possible for us to have the Then method run in the same manner as its "parent" Task, though the performance may be so bad that there is no point.

I assume that is the central argument here? While this could be done the expense buys you little or implementation would be so complex it would not be worth the cost of maintaining it?

stephentoub commented 2 years ago

I would still argue that it is technically possible

Anything is "technically possible", but at what cost and at what mental complexity. Besides, this puts things back into a callback-based world. The moment you want to do anything more complex, like loop, the complexity goes through the roof. That's the whole reason await was introduced in the first place. await is our story for composing tasks. I do not want us to go backwards by adding more callback-based APIs for such composition.

jmarolf commented 2 years ago

yep, you would need to either add all the same overloads that ContinueWith has (which makes this an exact duplicate of https://github.com/dotnet/runtime/issues/58692) or you have a complex internal implementation that might be easier to use (for some definition of "easier") but won't be worth the large cost to implement and maintain. and its unclear if we could even achieve an acceptable level of performance for all this complexity.

await is our story for composing tasks. I do not want us to go backwards by adding more callback-based APIs for such composition.

This is also a great POV for me to keep in mind. From my point of view many .NET developers are used to "callbacks" as that is the pattern that Linq and most "fluent apis" use, which are extremely common. The fact that await plays so poorly here is unfortunate but it is, at its core, a language design issue. I will try and push things as best I can on the language side first.

jmarolf commented 2 years ago

For future generations that may encounter this issue :) it appears that the pipe operator or some form of general monadic transformation is the most likely solution to be accepted by the csharp language designers.