fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
346 stars 21 forks source link

Avoid constructing intermediate results when chaining operations #1383

Open vanaur opened 2 months ago

vanaur commented 2 months ago

I propose that we add to F#, and to the standard libraries, a new type of sequence which would be pure (not accepting side effects) but which would have an interface similar to what Seq already offers. A collection that is free of side effects lends itself to a well-known form of optimization in Haskell (see here for example and details), which is stream/list fusion (a stream in Haskell is a kind of list but without the overhead that is normally associated with linked lists). Of course, this is not the only possible optimization, for example some results can be calculated in parallel, strictly, or lazily depending on the optimizer or the user choice, for example, F# already offers the Array.Parallel module.

Fusion (or deforestation) is an optimization technique that eliminates most, if not all, intermediate collections, leaving just one loop. It's a bit similar to loop fusion in imperative languages.

This style of optimization, known as Short cut fusion in Haskell, cannot be applied directly in F# or any other strict language (in any case, according to the classic methods, see below!) And I think that's fairly common in F#, or in functional programming in general, to reason with intermediate collections in a chained way; except that these intermediate collections have an impact on performance. In addition, most of the time, these collections are pure! But there is currently no mechanism in F# to deduce purity. If such a mechanism existed, then my suggestion would no longer need to consider adding a new type of "pure sequence", but would be limited to the proposed fusion optimization.

Consider the following codes:

Example 1

/// Calculate the factorial of a given number.
let factorial (n: bigint) =
    List.reduce ( * ) [ 1 .. n ]

Example 2

/// Performs a series of operations on the list before adding up all its elements.
let example2 (lst: int list) =
    lst |> List.map (( + ) 1)
        |> List.map (( * ) 2)
        |> List.filter (( <> ) 0)
        |> List.filter (( >= ) 0)
        |> List.reduce ( + )

Example 3

/// Check whether the given integer is a prime number.
let isPrime (n: bigint) =
    n > 0 && [ for x in 1 .. n do if n % x = 0 then x ] = [ 1; n ]

Example 4

/// Returns the list divided into two parts.
let split (lst: 'a list) =
    let len = List.length lst
    lst |> List.mapi (fun i x -> (i < len / 2, x))
        |> List.partition fst
        |> fun (x, y) -> (List.map snd x), (List.map snd y)

Example 5

/// Generates points via `f` between `min` and `max` with `delta` step which satisfy the given predicate.
let generatePositivePoints (min: double) (max: double) (delta: double) (f: double -> double) =
    [ min .. delta .. max ]
        |> List.map (fun x -> x, f x)
        |> List.filter (fun (x, y) -> x > 0)

Example 6

/// Composite the function `f` with passing parameter `a `n` times.
let composite (f: 'a -> 'a) (a: 'a) (n: int) =
    [ 1 .. n ]
        |> List.map (fun _ -> f)
        |> List.fold (fun acc g -> g acc) a

Example 7

/// Remove the last item of the given list.
let removeLastItem (lst: 'a list) =
    lst |> List.rev
        |> List.tail
        |> List.rev

All these examples of F# code seem to me to be fairly idiomatic and common. They all present optimization opportunities that could be applied with fusion and/or pure sequences. Note that I have deliberately chosen to use lists and not sequences or arrays to remain consistent, although real code would probably use arrays or lazy sequences in some situations rather than linked-lists.

The current way of doing things in F# is to apply the rewriting rules or the optimizations ourselves (but note that my examples are sometimes exaggerated for illustrative purposes). The examples 2, 4, 5 and 6 show the best cases of potential list fusion as found in Haskell. I refer you to the links mentioned earlier and down below to find out how Haskell would optimize this, but here's the idea in the case of the first example (example 2). You can try to apply rewrite rules yourself for the other examples, normally, they would all optimize in a similar way. The example is:

let example2 (lst: int list) =
    lst |> List.map (( + ) 1)       // Create a temporary list
        |> List.map (( * ) 2)       // Create a temporary list
        |> List.filter (( <> ) 0)   // Create a temporary list
        |> List.filter (( >= ) 0)   // Create a temporary list
        |> List.fold ( + ) 0        // Iterate over a temporary list

We know that for any function f: b -> c and g: a -> b and for any list xs: [a] the following relation holds:

map f (map g xs) = map (f . g) xs

In other words, in the F# code,

lst |> List.map (( + ) 1)
    |> List.map (( * ) 2)

is the same as

List.map (( * ) 2 << ( + ) 1) lst

Next, we know that for any function f: a -> Bool and g: a -> Bool and for any list xs: [a] the following relation holds:

filter f (filter g xs) = filter (\x -> g x && f x) xs

In other words, in the F# code,

lst |> List.filter (( <> ) 0)
    |> List.filter (( >= ) 0)

is the same as

List.filter (fun x -> 0 <> x && 0 >= x) lst

Combining the two previous results, we obtain

List.filter (fun x -> 0 <> x && 0 >= x) (List.map (( * ) 2 << ( + ) 1) lst)

So,

let example2 (lst: int list) =
       List.filter (fun x -> 0 <> x && 0 >= x) (List.map (( * ) 2 << ( + ) 1) lst)
    |> List.fold ( + ) 0

Then, we know that for all f: a -> bool and g: b -> a and for any list xs: [b] the following relation holds:

filter f (map g xs) = fold_back (fun x acc -> if f (g x) then g x :: acc else acc) xs []

So, the F# code become

let example2 (lst: int list) =
    List.foldBack
        (fun x acc ->
            if (fun x -> 0 <> x && 0 >= x) ((( * ) 2 << ( + ) 1) x)
            then (( * ) 2 << ( + ) 1) x :: acc else acc) lst []
    |> List.fold ( + ) 0

I know that this code is becoming increasingly unreadable, but please note that I'm only illustrating how the compiler may works step by step for optimizing the initial code. Now, the last rewriting step is "simple": just unwind the fold and you get the following optimized code:

let example2_optimized (lst: int list) =
    List.foldBack
        (fun x acc ->
            if (fun x -> 0 <> x && 0 >= x) ((( * ) 2 << ( + ) 1) x)
            then (( * ) 2 << ( + ) 1) x + acc else acc) lst 0

You can check that example2_optimized and example2 are identical. Only, one of the two codes uses 5 intermediate lists, while the other uses no intermediate list at all. According to my small benchmarks, for two random lists of 10,000 items between -10 and 10, the reduced code is around 36% faster on my machine, which of course comes as no surprise. I've carried out the same procedure for the other examples cited, and the code is obviously much faster every time. I think that the potential for optimisation is not negligible for real programmes either.

I was able to do the reduction in a fairly automated way because I simply applied the rewriting rules I showed you each step, but this isn't always easy to do automatically in a language with side effects, hence my proposal to add a collection type that is pure and can lend itself to this style of optimization. For example "PureSeq" or something like that.

Ideally, we wouldn't need to introduce this new type, but as mentioned above, F# doesn't allow purity to be checked and a previous suggestion from 2016 on the subject has been rejected.

Now, examples 1 and 3 are not specifically about fusion optimization, but about generating a temporary list/collection that will be directly fully consumed during the calculation process. Example 3 is easily optimized using lazy evaluation :

let isPrime (n: bigint) =
    n > 0 && [ for x in 1 .. n do if n % x = 0 then x ] = [ 1; n ]

The problem with this function is that the list will potentially generate many elements, whereas the comparison will only require two. The version of this code where lists are replaced by seq produces a more optimized code when n is not a prime number. But I'm still including it in my examples to suggest that PureSeq might also be lazy.

Example 1 is more interesting because it uses a list and a fold/reduce as substitutes for a loop:

let factorial (n: bigint) =
    List.reduce ( * ) [ 1 .. n ]

This code is elegant but produces a temporary list and is therefore slower. It's conceivable that a F# compiler implementing pure sequences might recognize such a pattern and eliminate the need to create the list:

let factorial (n: bigint) =
    let mutable result = 1

    for i in 1 .. n do
        result <- result * i

    result

The optimization of removing the creation of the list can also be applied to Example 5, where the list is only generated to be mapped and filtered, it is not provided by the user. How Haskell does this is illustrated here.

I've included Example 7 for discussion, as it's very specific to linked lists. As a reminder,

let removeLastItem (lst: 'a list) =
    lst |> List.rev
        |> List.tail
        |> List.rev

this function browses the list twice to remove the last element. I know this is typical of linked lists, but I don't know whether or not PureSeq should also be a linked list-like. The advantage would be that pattern-matching is possible (streams in Haskell aren't really linked lists, but allow a similar pattern matching). So I don't know how PureSeq should behave.

I think that initially, only fusion optimizations as presented in more detail would be relevant thant the last few examples, but I think the discussion could be taken further.


Pros and Cons

The advantage of this adaptation to F# is that it would encourage people to write idiomatic/pretty/normal code with no impact on performance, and on the contrary to way improve it.

The disadvantages of this adaptation to F# are that it would be a specific optimization for what I call PureSeq and would not be compatible with non-pure code. I don't know how interop with C# would look either, but it could be an F#-specific feature or a compiler feature. Also, this might make the compiler slower, but it's the style of compilation pass that would be done in release mode only and not in debug mode (debugging would be horrible otherwise).

Extra information

Estimated cost (XS, S, M, L, XL, XXL): it depends on how far the concept would be developed, but it would probably be XL or XXL because this would require changes to the behavior of the type system, code generation, optimization and maybe the introduction of a new native type with its associated module and functions.

Related suggestions:

Affidavit (please submit!)

Please tick these items by placing a cross in the box:

Please tick all that apply:

For Readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

vzarytovskii commented 2 months ago

That is an interesting suggestion, I'd be curious to see some details of how that could work.

That probably can be some advanced optimization for immutable structures, given that we already have some inline information, we can theoretically do more analysis (and store more data). The problem is that in order to do it (let's say with lists), both type and functions working with this type will need to become a compiler intrinsic (which is subjectively not a great thing), or we will need to provide a generic mechanism for providing more data to compiler.

I guess a custom computation expression can be used in this case, which will describe all the needed custom operations for map, reduce, folds, filters, etc.

vanaur commented 2 months ago

I think that once the compiler has validated the purity of a (chained ?) expression, applying simple rewriting rules would suffice. Obviously, the previous stage of this process remains unclear and as I don't really know how the F# compiler works internally I couldn't suggest a way of doing this. I indeed suppose that the compiler would have to be given more information, which in theory can be done as you say, but I personally don't know how it could be done as I don't work on the compiler code base.

In any case, in Haskell, this can be done in two ways: either it is hardcoded in the Haskell compiler, GHC, or the user can provide rewriting rules themselves (although the basic rules, like the one I originally presented, are also coded in the GHC built-in libs as far as I know). You can read more about it here. For example:

{-# RULES
"map/map"  forall f g xs.  map f (map g xs) = map (f . g) xs
  #-}

It's a powerful feature, but I don't know how it would fit into F# at the moment. Of course, one danger this raises is the inconsistency of the rules that the user could provide (I also think that checking semantic equality is an undecidable problem). Haskell, being a "particular" language used by and for research, can probably afford this risk, but I have my doubts about F#.

T-Gro commented 2 months ago

I big downside I see is the potential applicability of the optimization. Since it does change evaluation order even in the most basic case ( data |> List.map f |> List.map g rewritten into data |> List.map (f >> g), it cannot be applied automatically on anything which potentially has a sideeffect. Which practically means anything which uses an IL API call is also ruled out, e.g. to a method from BCL.

I don't see how a custom collection type would make this different - is it just to rule out backwards compat concerns? Because inclusion of a brand new collection type decreases the reach, and raises the learning bar for beginners.

If the code is detected to be pure, e.g. numeric code applied on collections, this could be done automatically even on existing collections. On the other hand, high-perf numeric code is unlikely to be using Lists in the first place and would probably be inclining more towards tensors or vectorized APIs where applicable.

vanaur commented 2 months ago

I big downside I see is the potential applicability of the optimization. Since it does change evaluation order even in the most basic case [...] it cannot be applied automatically on anything which potentially has a sideeffect. Which practically means anything which uses an IL API call is also ruled out, e.g. to a method from BCL.

You're right, that's why I invoked the concept of purity in the discussion. Now, it's true that this is becoming difficult to integrate into the .NET environment beyond F#, I hadn't really thought about it.

I think my proposal fits in more broadly with the proposals for an addition to make it explicit the purity of methods in .NET and C#, see here, for example.

I don't see how a custom collection type would make this different - is it just to rule out backwards compat concerns?

The proposal to add a new collection type was precisely an argument / an idea for trying to make the optimisation presented possible (even within .NET and without modifying the entire typechecker), but as you pointed out this doesn't specifically solve the problems of BCL methods, for example, which can have side effects and which F# has no control over checking.

Because inclusion of a brand new collection type decreases the reach, and raises the learning bar for beginners.

Proposing a new type of collection would effectively add a new learning element for anyone. Ideally we wouldn't have to do it (referring to previous discussion points), but as mentioned just before, it adds constraints that aren't necessarily within the remit of F#, which my initial proposal hadn't alluded to, sadly.

If the code is detected to be pure, e.g. numeric code applied on collections, this could be done automatically even on existing collections.

I'd love it if this could be done automatically, without the need to add a new collection type! However, from what I've been able to understand, but I may be wrong, there doesn't seem to have been any desire to infer purity in F# (see the issue mentioned in the original message).

On the other hand, high-perf numeric code is unlikely to be using Lists in the first place and would probably be inclining more towards tensors or vectorized APIs where applicable.

You're right, and I use such libraries myself. However, the original idea was to make F# faster without the user having to do much more than write the usual code.

T-Gro commented 2 months ago

I think the existence of side effects is not a showstopper as long as the different execution order is documented and cannot break existing code without people opting into it. A project-wide setting turning on the specific optimization on demand would work here, just not being on by default.

Once this is accepted, the optimization step could follow the design that has been recently done for collection builders by @brianrourkeboll, where coded sequences of expressions are rewritten. If similar style is followed, it would be the "hardcoded in compiler" option from this suggestion. (https://github.com/dotnet/fsharp/pull/17419/files#diff-4e1868a2541583a5bdb311ac5e4a45e687425bb1218dcc3706a9fbbb88fc0ae4L584)

vanaur commented 2 months ago

It could indeed be interesting, I wasn't aware of this new design. I suppose it could be part of that.