dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.92k stars 785 forks source link

`async` CE implementation for `and!` #10301

Closed et1975 closed 3 years ago

et1975 commented 4 years ago

Is your feature request related to a problem? Please describe. With applicatives support landing in F# 5 I assumed we'll see FSharp core implementing the support in existing CEs, specifically async.

Describe the solution you'd like Since it hasn't been implemented yet I'd like to propose that the implementation executes child asyncs in parallel when and! is used.

Describe alternatives you've considered Implement sequentially. Pros: the same semantics as let! - no surprises. Cons: Seems like it would be a missed opportunity to hide the boilerplate currently required to exec 2 child asyncs (of different results type) in parallel.

cartermp commented 4 years ago

Note: keeping this here rather than language-suggestions for now. I threw together a naiive version here:

module Async =
    let map f computation =
        async.Bind(computation, f >> async.Return)

    let zip c1 c2 =
        async {
            let! x = Async.StartChild c1
            let! y = Async.StartChild c2
            let! x' = x
            let! y' = y
            return x', y'
        }

type AsyncBuilder with
    member _.BindReturn(x: Async<'T>, f) = Async.map f x
    member _.MergeSources(x: Async<'T>, y: Async<'U>) = Async.zip x y

I'm sure there is a far better implementation possible though.

Smaug123 commented 4 years ago

I would naively expect something more along the following lines:

let zip c1 c2 =
    [| c1 ; c2 |]
    |> Async.Parallel
    |> Async.map (fun a -> a.[0], a.[1])

except of course this would incur an unbox to do it in a type-safe way; there are a number of awful ways to fix this, e.g. the following monstrosity:

    let zip (x : 'a Async) (y : 'b Async) : Async<'a * 'b> =
        let mutable r1 = Unchecked.defaultof<'a>
        let mutable r2 = Unchecked.defaultof<'b>
        [|
            async {
                let! x = x
                r1 <- x
                return ()
            }
            async {
                let! y = y
                r2 <- y
                return ()
            }
        |]
        |> Async.Parallel
        |> Async.Ignore
        |> Async.map (fun () -> (r1, r2))
cartermp commented 4 years ago

there are a number of awful ways to fix this, e.g. the following monstrosity:

Luckily, FSharp.Core internals are the right place for that sort of thing :)

gusty commented 3 years ago

Adding this would lead to implicit parallelism. I don't think it's a good idea. Sequential implementation would be a no-surprise alternative. Async means asynchronous, not parallel.

cartermp commented 3 years ago

Not sure I understand why it's not a good idea. Since you can't depend on the values returned from the computations there's no real drawbacks to having and! that wouldn't already exist (e.g., effectful computations you expect to be executed in order).

gusty commented 3 years ago

Yes, but you are talking from a technical viewpoint. I'm not saying is not feasible, of course it is, what am saying is that from a conceptual viewpoint having an abstraction called async that does some operations sequentially by default and others in parallel by default is not a good way to abstract, is it rather a good way to confuse the users.

I mean, you could also apply this reasoning to operation with arrays, why not doing some of them in parallel ? Probably for the reasons I already exposed, instead of that we have a dedicated submodule of Array called Parallel, which clearly express the intention.

Just because we can do something, doesn't mean it's a good idea.

Maybe an alternative is to create a different builder for this, call it asyncParallel.

In some Haskell-like languages where they tend to group operations in types to take advantage of type-classes, they had the same issue, so I think they created a different type in such a way that the parallelism is not implicit. So, although the solution is different, the problem is the same: implicit parallelism is evil :)

cartermp commented 3 years ago

To use the etymology of the word, "async" just means "not at the same time", which is orthogonal to parallelism, concurrency, sequential operations, etc. We now have a language-level construct that disallows depending on the result of a previous computation (thereby implying sequential execution), so it's worth thinking through what a another model with and! looks like and what the positive/negative outcomes are. I don't really buy an argument of "but it's evil" when talking about an assumed implementation without anything backing that up.

gusty commented 3 years ago

I don't really buy an argument of "but it's evil" when talking about an assumed implementation without anything backing that up.

It seems that you read only my last sentence. Maybe the text in your browser was clipped ;)

cartermp commented 3 years ago

Nope. You haven't given a reason why not to do this aside from saying it's not done for other things. What is your reasoning based on async programming with F# today? Are there scenarios you have in mind?

Smaug123 commented 3 years ago

I can understand why one might not want it to be parallel by default. However, one of the main objections might be "what about side effects", and in my opinion it's also definitely wrong for users to be able to rely on a known evaluation order of Async.zip; so in my mind that would not be a valid objection.

gusty commented 3 years ago

I can think of many scenarios where this would surprise me, but I was trying to give a more generic reason why.

This is just another instance of the typical problem I have a monad, cool then I have an applicative, hey but it turns out there is another applicative possible which is not consistent with my monad but as applicative is more convenient, great let's use that instead ==> Now I have a monad now that breaks the reasoning that a monad is also an applicative. It is one, but a different one. Now good luck reasoning about the code, without looking at the builder source.

So, this happens also with the:

Now, the async one is a bit special in the sense that, as opposed to the the above mentioned it won't change the result in a pure computation, but it will create the parallel side effect, so if our function is not pure, surprise, results are not the same.

The interesting fact is that even in pure languages they decided to avoid that implicit parallelism, so here we would be being more pure than the purists.

The equivalence between monad and applicatives is:

f <*> x ===  f >>= (fun x1 -> x >>= (fun x2 -> return (x1 x2)))

yeah, I know I'm sounding too cryptic here, and you might wonder, who cares about that math, also it was recommended not to use operators for applicatives.

But it turns out that using CEs (instead of operators) this equivalence becomes even more evident:

    let! x = X
    let! y = Y
    return f x y

it's equivalent to

    let! x = X
    and! y = Y
    return f x y

We use and! to take advantage of a more efficient dedicated construct. But if we mix things, we get different results both in the List and the Result case.

Now in applicative, as I said before the different is not appreciable in pure code. We can actually simplify this discussion and turn it into the discussion "Is implicit parallel execution considered just an optimization?".

Do I really need to give you specific scenarios for this?

Smaug123 commented 3 years ago

Honestly I think you've just persuaded me that it should be parallel. The fact that we use different syntax (and!) to me is enough justification: we're not using the inherited applicative instance (which would, after all, just be the same as let!), because we're using a different syntax. It would be actively confusing to me if explicitly using the applicative syntax nevertheless resulted in monadic behaviour.

I think this holds for the applicative Result as well, by the way.

gusty commented 3 years ago

I think this holds for the applicative Result as well, by the way.

Yes, and let's make it also pointwise for seq ;)

So and! means parallel, and what would be the way to take advantage of an optimized sequential operation?

resulted in monadic behaviour.

See? You seem to start confusing monadic with a specific applicative. Monadic behavior is not describing sequential behavior.

I can propose instad to create a different CE, let's say asyncParallel which, as its name suggest it has (guess what) parallel behavior as default.

Why looking for shortcuts which leads to inconsistency, when you could do it properly? Just to save a few keystrokes? Does it worth it?

Smaug123 commented 3 years ago

The seq example is different because the following fails to compile ("the use of let! is not permitted in sequence expressions"). The seq computation expression already forbids any syntax that could suggest zipping.

seq {
    let! foo = Seq.empty
    return foo
}

You seem to start confusing monadic with a specific applicative. Monadic behavior is not describing sequential behavior.

I certainly don't feel confused. Constructing the async object using the Async monad must be done by sequencing the async objects, and the semantics of the resulting Async would be extremely odd if the async computations were not also sequenced in the same way. The syntax and! is sufficiently different from the monadic syntax, and sufficiently restrictive, that it seems perfectly reasonable to use the Async standalone applicative for it.

The only possible places where the syntax of the computation expression is ambiguous between the applicative and the monad are:

let a =
    async {
        return 5
    }
let b =
    async {
        let! thing = Async.lift 5
        return 5
    }

And fortunately the semantics of the monad and the applicative coincide in those two cases anyway.

I don't see an argument against let!... and! always referring to the "obvious" applicative instance even when it does not coincide with the monad you'd get from let!... let!. After all, if you wanted to use the monadic instance, you could use let!... let! with absolutely no additional effort.

Since we have a different and incompatible syntax for applicative CEs, I just think it would be a waste if we didn't use it to express the different and incompatible applicative!

gusty commented 3 years ago

@Smaug123 let! in seq is for, so you can still add let! and and! to the mix.

I don't see an argument against let!... and! always referring to the "obvious" applicative instance

What is obvious for you, might not be obvious for others, or even for you at a different time ;)

After all, if you wanted to use the monadic instance, you could use let!... let! with absolutely no additional effort.

Yes, but it wouldn't allow me to use a future optimization of the (sequential) applicative.

the Async standalone applicative for it.

Which one? There are 2? Ok, you say you're not confused, I believe you but believe me that starting to call monadic a specific implementation and applicative another one is the beginning of the end in the longstanding confusion most users arrive. I don't blame them, people write blogs (and online books) with these tricky words used in the wrong place.

Since we have a different and incompatible syntax for applicative CEs, I just think it would be a waste if we didn't use it to express the different and incompatible applicative!

This part I really don't get it. How is it incompatible?

Let's not forget that all this started as mechanism for optimizations: https://github.com/dotnet/fsharp/pull/7756#issue-330984121 "for more efficient computation expressions" but now we're discussing about changing the semantics.

Of course Computation Expressions are kind of free, in the sense you can roll your own, and it could be not even a monad, but if we're talking about the async CE from FSharp.Core, there are some expectations for those who like to re-use the reasoning, instead of reading docs or source code of "convenient code that breaks abstractions to save some keystrokes".

Namely, in the Computation Expression Zoo paper, there is a law that states the expectation of equivalence between functors and monads, in monadic CEs.

I'm not convinced of saving a few keystrokes with implicit behavior, that was not the original spirit of F#, to me that fits better in languages like C# where you have lot of unrelated overloads and implicitness, which makes impossible to reason and promotes either learning everything by heart, or work with the reference open all the time.

Having said that, nothing stops you from developing your own "convenience-key strokes-saving-abstraction breaking-surprising" library. In this case it's just a few lines, you can just add the extensions methods you need for async or (even better) create your own, by inheriting the Builder, but let's don't do this in Core.

Smaug123 commented 3 years ago

By "incompatible" I mean "is not semantically the same as the corresponding applicative induced by the monad".

I think the fact that we're talking about async is clouding the issue here. How about we consider result instead?

In my mind, it's "obvious" that the following will evaluate all the results and zip them together, so that c.Value = 1 and the result is Error "hi":

let c = ref 0
result {
    let! a = Ok ()
    and! b = Error "hi"
    and! foo = Ok (incr c; ())
    return ()
}

By contrast, let! throughout would result in c.Value = 0.

I take it you disagree?

Smaug123 commented 3 years ago

(Flat text is a very poor format for this! The discussion is naturally a tree, and we're being forced to evaluate it breadth-first.)

gusty commented 3 years ago

I want to comment this separately as it's more technical, so it's a bit unrelated to the reasons above explained.

Actually, there might be a technical limitation of doing this.

In https://github.com/fsharp/fslang-design/blob/master/preview/FS-1063-support-letbang-andbang-for-applicative-functors.md

It states, that if the BindN method is not present, it will call a chain of methods starting from Bind.

Now, here's where things start to go wrong, if we implement some methods with parallel while having Bind sequential, and we'll never be able to cover all BindN methods, at some point we'll give up coding them, so there would be always a case not handled in parallel.

This is just a consequence of mixing different semantics, again, if you go back to the RFC, it clearly states the motivation was to create more efficient CEs, not different semantics. The feature is designed in such a way that you can incrementally add optimizations.

But one more time, if we consider Parallel an optimization, I would say we should focus in that discussion instead (to me it's clearly not).

Smaug123 commented 3 years ago

If we implement some methods with parallel while having Bind sequential, and we'll never be able to cover all BindN methods, at some point we'll give up coding them, so there would be always a case not handled in parallel.

But that's not what will happen. If there is no corresponding BindN, we fall back to nesting MergeSources (i.e. zip), which in my interpretation is parallel.

Smaug123 commented 3 years ago

Could you give an example of an optimisation you could add to async in the sequential case, even in principle? I am currently failing to imagine one. I would go so far as to argue that the only obvious optimisation to an aggregation of asyncs is to run them in parallel before aggregating the results.

Edit: I guess actually there's an advantage in constructing the asyncs all up front before you evaluate them, if you're applicative and have been able to explore the whole object at async-construction time.

wallymathieu commented 3 years ago

Even if it's an a common thing to do Async.Parallell, will hidden usage of parallell break existing code? How easy is it to use the feature in an unintended way? Isn't this a thing that you could write a lib for (so that when you want to have async behave in that way you can use that lib instead)?

Smaug123 commented 3 years ago

My assertion is that and! leads the user to expect that they cannot rely on the order of execution, since normally bind is what allows you to sequence inputs and they have explicitly not used bind.

gusty commented 3 years ago

Could you give an example of an optimisation you could add to async in the sequential case, even in principle?

Not at the moment, would need to dig into async internals, but I'm sure it's possible.

Anyways, I think we exchanged already many ideas about the pro/cons of this proposal, from the theoretical point of view.

I'm still convinced it's not a good idea, but leaving that feeling aside, in order to be able to implement an async-parallel applicative we would need to solve the problem of the zipMany function, which current implementation doesn't allow to express.

With @baronfel we did an experiment to see how can we upgrade the compiler to support these kind of constructs. The result is here: https://github.com/dotnet/fsharp/pull/9605 it allows 2 ways of doing it, run-time and compile-time.

For the compile-time flavor, it relies on another PR related to tuple type equivalence which is not approved at the moment, but we don't need it for the run-time one.

et1975 commented 3 years ago

Adding this would lead to implicit parallelism.

Objection! Your honour, the parallelism would be explicit due to the different keyword!

cartermp commented 3 years ago

My assertion is that and! leads the user to expect that they cannot rely on the order of execution, since normally bind is what allows you to sequence inputs and they have explicitly not used bind.

I agree with this the most. I can appreciate that some people might expect and! to execute similarly to let! (but without depending on the values), but it also means that and! wouldn't be particularly useful for async.

wallymathieu commented 3 years ago

Since and! is explicitly intended for applicatives, then you'd end up with as @gusty points out a half thing. People that reason about the async-monad in terms of the definition would then need another implementation besides the F# Core one (since that one in F# Core then becomes broken).

dsyme commented 3 years ago

@gusty is correct - no implementation of let! ... and! ... should introduce calls to Async.StartChild, Async.Start or Async.Parallel - all of which start queue work in the thread pool. These calls must always be explicit. To be honest I feel it would be better if all of these had names like Async.StartChildInThreadPool, Async.StartInThreadPool and Async.ParallelInThreadPool. ANy introduction of the thread pool should be explicit - partly because the basic async { ... } builder should always have an interpretation in a computational setting like Javascript where the thread pool is not available, and the only mechanism available is to queue a work item on the UI thread.

As one litmus test, introducing a do-nothing and! should not change the semantics of an existing let!, e.g.

let! x = p1 and! y = async { return () } in ...

should be equivalent to

let! x = p1 in ...

That is, p1 should be started immediately in the current thread and, if it finished synchronously, the computation should continue on the current thread.

I believe the accurate implementation that meets litmus tests such as these is something like this:

module Async =
    let map f computation =
        async.Bind(computation, f >> async.Return)

    let zip c1 c2 =
        async {
            let! ct = Async.CancellationToken
            let! x = Async.StartImmediateAsTask (c1, cancellationToken=ct)
            let! y = Async.StartImmediateAsTask (c2, cancellationToken=ct)
            let! x' = Async.AwaitTask x
            let! y' = Async.AwaitTask y
            return x', y'
        }

type AsyncBuilder with
    member _.BindReturn(x: Async<'T>, f) = Async.map f x
    member _.MergeSources(x: Async<'T>, y: Async<'U>) = Async.zip x y

With this, given the code

    let! x = async1
    and! y = async2
    E

The effect of this is to start async1 and run it immediately on the current thread up to its first async release, then do the same for async2, and then wait for the two results.

Note that it's my understanding that

    let f c1 =
        async {
            let! x = c1
            return x'
        }

and

    let f c1 =
        async {
            let! ct = Async.CancellationToken
            let! x = Async.StartImmediateAsTask (c1, cancellationToken=ct)
            let! x' = Async.AwaitTask x
            return x'
        }

are essentilly equivalent (up to trampolining, and perhaps exceptions?) - if they differ then we must adjust the above implementation.

et1975 commented 3 years ago

Parallel await on the completion is all I was after, pardon for stating it poorly. Would be happy with Don's implementation.

NinoFloris commented 3 years ago

@dsyme This would still mean implicit parallelism, just moved past initial yield, which as a redeeming quality for this implementation has the necessary consistency with the monadic semantics; a consistent initial order of execution which is very important.

As your given implementation is completely in line with Task's promise-style semantics this is convenient for me as TPL library author, but is it the right one for Async as well?

I do know, and have experienced first hand, a very common mistake beginner Scala programmers make is one where they construct a list of Futures (essentially Scala's Task) and suddenly have 100s of work items in flight.

The given implementation could end up encouraging very similar behavior, albeit somewhat more opaquely due to the necessary involvement of an applicative operation. It could make this a new failure mode for Async - very similar to many Tasks starting all at once - and generalizing over any applicative functor as @gusty would like to do with FSharpPlus makes this a 'dangerous' implementation.

For Task there is no choice than to align to this particular applicative semantics but it's worth it to think carefully about the least surprising default for Async.

The essential question is: would you under any circumstance want to allow a large number of work items in flight for a single Async applicative operation.

For reference, Scala's default for ZIO/Cats/Scalaz/Monix IO types is sequential applicatives and Haskell IO is similar, all ecosystems have exceptions like parMap or types that have a parallel applicative instance but for none is it the default. Some more background on all of this can be found here:

NinoFloris commented 3 years ago

To add, I personally don't have a strong preference, considering .NET's uniform and very well behaved threadpool I could certainly go both ways.

Exploring our options and some prior art I merely hope helps us pick one we won't regret :)