Add better support for cancellation tokens passing

fsprojects / FSharp.Control.TaskSeq

A computation expression and module for seamless working with IAsyncEnumerable<'T> as if it is just another sequence

MIT License

91 stars 7 forks source link

Add better support for cancellation tokens passing #133

Open abelbraaksma opened 1 year ago

abelbraaksma commented 1 year ago

Currently, it is not very clear how users can pass a CancellationToken through. While the CE has support for cancellation tokens, and the token is passed on to GetAsyncEnumerator(ct), unless users access the interface directly, there is currently no way to pass a cancellation token.

There are four, not necessarily exclusive ways, to implement this.

Add TaskSeq.setCancellationToken, which will return a taskSeq with the given cancellation token
Add a do! overload that allows writing taskSeq { do! myCancelToken } in your CE, otherwise behaving the same as (1)
Add a custom operation, such that you can write taskSeq { cancellationToken myCancelToken } in your CE
Add a parameter to the taskSeq CE constructor. However, if possible, this may not be easily discoverable

There are upsides and downsides to each of these approaches. I think the first option, together with a getCancellationToken should at least be supported.

However, there are other challenges as well. The helper functions in the TaskSeq module all use the CancellationToken() constructor (that is: no cancellation support). The taskSeq builder in that respect is a bit of a mixed beast. Yes, you can pass in a cancellation token, but you have to do it manually, and then, using any of the helpers will basically ignore this token.

That's not a good position to be in. Probably, code like source.GetAsyncEnumerator(CancellationToken()) should become something like source.GetAsyncEnumerator(TaskSeq.getCancellationToken source). Which runs into another issue: while the state machine has a property for the cancellation token, any other implementation of IAsyncEnumerable<'T> does not, which requires a type check to detect.

Adding all this magic comes at a cost. In AsyncSeq and Async this was resolved by adding overloads that take an optional cancellation token. Not ideal either.

abelbraaksma commented 1 year ago

Implementation progress: #132. Earlier discussion (see @bartelink and @gusty's comments): #114.

abelbraaksma commented 1 year ago

Let me answer your questions from the other thread here, @bartelink:

People could cope with this, and it's obviously terse. It's also arguably a dual to the do! fun ct -> ... construct in IcedTasks.

Yes, that's similar, but not quite the same. This proposal is about setting the token. In IcedTasks, it is about getting the token. You need then provide your actual token when you run the (delayed) task.

Allowing people to access the token by writing such code could be much easier then having them passing the token manually through their own functions.

would you ever want to setCT more than once?

Task sequences are "read many". If a task sequence is passed on to several consumers, each could set their own cancellation token. Which in turn brings up another question. This should not be part of the taskSeq state, but of the IAsyncEnumerator state (that is: not the enumerable!). People may conceive it that way with the do! syntax.

Now, this is actually done correctly with the do! approach. It will only assign the cancellation token once enumeration actually starts, after calling GetAsyncEnumerator(ct). It effectively overrides whatever is in ct, so this is another concern (for manual iteration).

Is there any way the CT can become an argument to the builder?

Yes, this is possible, but it doesn't solve the other issues I raised above, unfortunately. Also, you may not always want to pass the cancellation token at the moment of creation. I.e., it's not uncommon to consume a task sequence and wanting to control cancellation from the consumer side.

bartelink commented 1 year ago

Thanks for the clarifications regarding my (naive) questions, and for thoroughly mapping out the overall requirements.|

I agree that setCancellationToken/getCancellationToken as proposed are good baseline APIs given the constraints being resolved and unfortunately can't see any cleaner solution.

dsyme commented 1 year ago

I haven't followed the repo in a few weeks, so am answering at a high level

If a TaskSeq is eliminated to a primitive value (e.g. `int) an optional cancellation token parameter can be provided
If a TaskSeq is converted to an Task an optional cancellation token parameter can be provided
If a TaskSeq is converted to an Async (e.g. some TaskSeq.lengthAsync) or CancellableTask/ColdTask then the TaskSeq should get its cancellation from that computation

For each function in the API accepting TaskSeq as input this is an automatic process

Look at the outputs, and see if it implies a cancellation token (TaskSeq, Async, CancellableTask...). If so, use that
If no output implies a cancellation token, then add an optional cancellation token as an input parameter.

For example, following these principles rigorously:

TaskSeq.length: TaskSeq<'T> * ?ct:CancellationToken -> int
TaskSeq.lengthAsync: TaskSeq<'T> -> Async<int> should get its cancellation token from the Async
TaskSeq.lengthAsTask: TaskSeq<'T> * ?ct:CancellationToken -> Task<int>
TaskSeq.lengthAsCancellableTask: TaskSeq<'T> -> CancellableTask<int>
TaskSeq.toList: TaskSeq<'T> * ?ct:CancellationToken -> 'T list
TaskSeq.toListAsync: TaskSeq<'T> -> Async<'T list>
TaskSeq.toListAsTask: TaskSeq<'T> * ?ct:CancellationToken -> Task<'T list>

Note that you'd have to a TaskSeq class containing static members to use the optional methods.

One case that's a little unclear to me is whether a result type of Seq should use the Dispose, e.g.

TaskSeq.toSeqBlocking: TaskSeq<'T> -> 'T seq // blocks the current thread on each MoveNext, each uncancellable, Dispose of seq cancels.
TaskSeq.toSeqOfAsync: TaskSeq<'T> -> Async<'T> seq // Dispose of seq enumerator cancels? Each individual MoveNext results in an Async using the cancellation token for that async
TaskSeq.toSeqOfTasks: TaskSeq<'T> -> Task<'T> seq // Dispose of seq enumerator is the overall cancellation signal?

dsyme commented 1 year ago

In AsyncSeq and Async this was resolved by adding overloads that take an optional cancellation token. Not ideal either.

In short, I believe this is the only correct/viable solution. Anything else is making cancellation impossible to reason about accurately and reliably, and you may as well not build it in to the CE.

abelbraaksma commented 1 year ago

@dsyme thanks for the high level insights (if you have the time, I’d like to discuss TaskEx as a new FsProjects lib, to harmonise the proliferation of task helpers in every task-related lib out there; I’ll ping you on Slack).

I kinda feel icky about leaving the module and using static members on the type, it feels non-idiomatic, but to allow normal composition and piping with optional params, that’s the only way afaik (and Async does the same).

This should not be part of the taskSeq state, but of the IAsyncEnumerator state (that is: not the enumerable!).

All your comments above seem to apply to this (from OP). I think we’re in agreement. Once the taskSeq is enumerated, a token should be passed (blocking and Task results) or passed through (Async results).

The only thing left out here is how to pass or get a token to a for loop in the Async or Task CE. But I’ve some idea how to do just that.

In short, I believe this is the only correct/viable solution.

I’ve now come to the same conclusion, thanks!

dsyme commented 1 year ago

The only thing left out here is how to pass or get a token to a for loop in the Async or Task CE. But I’ve some idea how to do just that.

For Async or CancellableTask you use Async.CancellationToken/CancellableTask.cancellationToken

For Task it isn't possible unfortunately - but you could have an overload of Task.For taking an IAsyncEnumerator I suppose?

task { 
    for x in taskSeq.GetEnumerator(ct) do ...
}

abelbraaksma commented 1 year ago

I didn’t consider that, but yeah, that may be possible. A bit tricky, perhaps, as our IAsyncEnumerable itself implements IAsyncEnumerator to avoid an unnecessary shadow copy. I’ll have to check, it’s probably no big deal, as I think I cast before returning, so unless it is recast, it shouldn’t matter.

Just thinking out loud ;).

abelbraaksma commented 10 months ago

I never considered this before, but this approach appears to work quite well, albeit with a few caveats (/cc @dsyme):

type TaskSeq =
    static member isEmpty source : Task<bool> = Internal.isEmpty CancellationToken.None source
    static member isEmpty(token: CancellationToken) : (taskSeq<'T> -> Task<bool>) = Internal.isEmpty token

// example of using it
module Test =
    let f() = TaskSeq.empty |> TaskSeq.isEmpty CancellationToken.None
    let g() = TaskSeq.empty |> TaskSeq.isEmpty

Note that this is deliberately not the same as the tupled approach above. There are a few advantages for this approach:

The cancellation token is in its natural position (left-most vs right-most in the tupled approach)
The functions can be used in functional piping and composition styles, even with cancellation token (not possible with the tuple approach)
It has a more natural FP feel

And some disadvantages:

These are essentially single-argument overloaded functions, with different result types (a function vs a value)
There's less tooltip-support for the function arguments, as the second and further arguments will be part of the returned function and can therefore not be named or xml-documented
Potential extra overhead if the compiler doesn't auto-inline these returned functions

gusty commented 10 months ago

I will add the main disadvantage of this ad-hoc overloading is that it hinders reasoning about the code, which makes me disagree about

It has a more natural FP feel

actually it goes against currying, to me it has a more C#ish feel.

Then, for the same reasons, there is the potential problem that the compiler takes an undesired overload when type information is not present.

abelbraaksma commented 10 months ago

is that it hinders reasoning about the code,

Yes, this is certainly a downside. But the same could be said for the overload with the tupled arguments, as used in Async. I only found out after years of using it that it even existed.

actually it goes against currying, to me it has a more C#ish feel.

How? It is a function that takes an argument that returns a new one-argument function, like 'a -> 'b -> 'c), essentially. Or, in implementations, like let f x = fun y -> z. How is that non-functional?

there is the potential problem that the compiler takes an undesired overload when type information is not present.

Again, how? These overloads are deterministic. Wrong overload resolution only happens if the types overlap. Here that's never the case, as the overloaded function is the one with the CancellationToken.

I'm not saying that there aren't any downsides, but the Async approach, where there is a mysteriously defaulted second half of a tuple, is rather surprising as well, and what is more, it leads to much more verbose code:

type TaskSeq =
    static member isEmpty(source: Task<bool>, ?token: CancellationToken) -> Task<bool> = Internal.isEmpty(source, token)

// example of using it
module Test =
    // this is what I don't like, esp in pipe-chains, or scenarios where simple currying would otherwise suffice
    // i.e., you cannot use the cancellation token with currying at all, you have to create a new function
    let f() = TaskSeq.empty |> fun ts -> TaskSeq.isEmpty(ts, CancellationToken.None) 
    let g() = TaskSeq.empty |> TaskSeq.isEmpty

Anyway, I definitely agree it is a novel approach and as such it may raise some eyebrows... 🤨

gusty commented 10 months ago

Currying states that given a function 't1 -> 't2 -> 't3 when 't2 is not supplied the result becomes 't2 -> 't3. Here this reasoning is broken as it becomes t3.

When the F# compiler is solving types, it tried to do overload resolution, even if type information is not fully nominalized it will attempt to choose a solution, so if you're using this in a generic function it's likely that it will pick an undesired overload or force you to type annotate the function.

The Async approach uses optional arguments to a method, which doesn't interfere with currying.