composewell / streamly

High performance, concurrent functional programming abstractions
https://streamly.composewell.com
Other
866 stars 66 forks source link

Sounds like dropping `Yield` may now simplify and speedup code #1316

Open twhitehead opened 2 years ago

twhitehead commented 2 years ago

Happened to come across this in Compiling without Continuations (the paper 2016 about the addition of join points to GHC intermediate language) (see Section 5 on pg 489-490 for the quote)

Stream Fusion It turns out that this new ability to move a consumer all the way to the return points of a tail-recursive loop has direct implications for a very widely used transformation: stream fusion. The key idea of stream fusion is to represent a list (or array, or other sequence) by a pair of a state and a stepper function, thus:

data Stream a where
  MkStream :: s -> (s -> Step s a) -> Stream a

There are two competing approaches to the Step type. In unfold/destroy fusion (Svenningsson [26]), we have:

data Step s a = Done | Yield s a

Hence a stepper function takes an incoming state and either yields an element and a new state or signals the end. Now a pipeline of list processors can be rewritten as a pipeline of stepper functions, each of which produces and consumes elements one by one. A typical stepper function for a stream transformer looks like:

next s = case <incoming step> of
           Yield s’ a -> <process element>
           Done -> <process end of stream>

When composed together and inlined, the stepper functions become a nest of cases, each scrutinizing the output ofthe previous stepper. It is crucial for performance that each Yield or Done expression be matched to a case, much as we did with Just and Nothing in the example that began Sec. 2. Fortunately, case-of-case and the other commuting conversions that GHC performs are usually up to the task.

Alas, this approach requires a recursive stepper function when implementing filter, which must loop over incoming elements until it finds a match. This breaks up the chain of cases by putting a loop in the way, much as our any above becomes a case on a loop. Hence until now, recursive stepper functions have been un-fusible. Coutts et al. [6] suggested adding a Skip construtor to Step, thus:

data Step s a = Done | Yield s a | Skip s

Now the stepper function can say to update the state and call again, obviating the need for a loop of its own. This makes filter fusible, but it complicates everything else! Everything gets three cases instead of two, leading to more code and more runtime tests; and functions like zip that consume two lists become more complicated and less efficient.

But with join points, just as with any, Svenningsson’s original Skip-less approach fuses just fine! Result: simpler code, less of it, and faster to execute. It’s a straight win

harendra-kumar commented 2 years ago

Unfortunately, the approach seems to work only for pure streams. The paper gives an example of a pure stream, However, streamly's stream type is monadic, something similar to:

data Stream m a where
  MkStream :: s -> (s -> m (Step s a)) -> Stream m a

We did try the skipless approach for streamly but it did not seem to work for monadic streams. I would be happy to use it if someone can demonstrate that it works for monadic streams, for all cases in streamly.

twhitehead commented 2 years ago

Ah. That's too bad. I wonder if @lukemaurer, who was author of both the Compiling without Continuations and the Sequent Calculus as a Compiler Intermediate Language papers and whose name comes up a lot in the on the haskell associated tickets page, has any thoughts on why the Yield is still requeired in monadic streams and if anything can be dane about this.

lukemaurer commented 2 years ago

Hmm ... having things parameterized over the monad does make it hard. Conceivably you could exploit the MonadFix laws (the Sliding Rule is tempting) to get something similar to what happens in the paper.