primeLayout: rewrite to avoid iterated transformations

JohnLato commented 11 years ago

iterated calls to transformations tend to be slow. This is especially notable in the Poster.hs benchmark, which sees about a 30% speedup (on my system) with this change. Also see the note and squares' in diagrams-bench:/Rotations.hs.

Ideally calling iterateN on transformations should be much faster, but I think that may require significant changes to the current Transform type.

byorgey commented 11 years ago

Looks good. Out of curiosity, why are iterated transformations slow, and what sort of changes might be necessary to make them faster?

JohnLato commented 11 years ago

I believe the problem with iterated transformations is due to two issues working against us. First, the Transform is implemented via linear maps, which in turn are MemoTries, which are functions. The functions are dynamically nested (e.g. the functions returned by iterateN don't appear directly in the core, so they can't fuse), which means GHC doesn't really have an opportunity to optimize the chain. So to get the final transform from a deep nesting of transforms requires a lot of function evaluations. Secondly, this generates much larger QDiagram trees, because they have a separate annotation for each transformation. I haven't looked into the relative impacts of these two issues.

There are a few ways to improve it. I was considering trying a re-implementation of linear maps that represents a linear map as an associated type, e.g. for R2 it could be newtype V2 = V2 R2, which would be simpler and presumably more efficient (in both space and time). I also think that GHC would be more amenable to combining nested calls into a single packed structure, although that might not help for iterateN. This should be relatively simple to implement though.

Also, it might make sense to optimize the QDiagram structure a bit. As a simple step, if the root node is an annotation, and we're applying another annotation, we could remove the annotation, combine it with our new annotation, and add the combined annotation at the root. I haven't thought through the full ramifications of this, but my instinct is that it wouldn't cause any problems. The separate annotations would no longer be separable from within the tree structure, but I don't know that we would ever do that. It hasn't really been possible with the dualtree API anyway.

An alternative (which I only just thought of so didn't test, but is probably about as good as this commit) would be to use 'rotation' instead of 'rotate/rotateBy'. It might make sense to have RULES like

{-# RULES "iterate/rotate" forall n theta. iterateN n (rotate theta) = \ls -> map (flip transform ls) $ scanl (<>) mempty (replicate n (rotation theta)) #-}

I'm not particularly fond of this sort of rule, as it's limited in applicability, but it might help with this common case. I haven't given too much thought to the RHS, but composing T2/Transformations should be better than composing 'transform xxx' functions.

It would be nice to have GHC do a better job of optimizing chained function calls too. The benefits would extend far beyond diagrams I think. I'm not really sure how to go about implementing it unfortunately. Maybe some of the supercompilation work will trickle into GHC. Alternatively maybe a compiler plugin would work?

byorgey commented 11 years ago

Thanks for the detailed analysis! A few thoughts:

Re: having a separate annotation per transformation in the QDiagram trees, I thought that's what should have been happening already! Indeed, if there is a d annotation at the root and you apply another one, they ought to simply fuse into one instead of adding another node. There never were and never will be any guarantees about being able to access separate annotations within the tree structure. We are always allowed to make semantics-preserving modifications to the tree, and we simply make a best-effort to preserve the sort of structure that might make things easier for backends.
A re-implementation/redesign of linear maps is an interesting idea. I know Conal originally used the linear maps in vector-space with e.g. things like reactive and fieldtrip and got pretty good performance out of it. But perhaps he was using it in a fundamentally different pattern than us, I am not sure.
Re RULES and so on, I think these sorts of limited-application, domain-specific optimizations is precisely what RULES are for. I have also recently been impressed with the HERMIT project at KU (http://www.ittc.ku.edu/csdl/fpg/software/hermit.html) --it makes it easy to explore and codify domain-specific core-to-core optimization passes beyond what RULES can handle. I wonder if we might be able to make use of it somehow.

JohnLato commented 11 years ago

Oh, bother. I think you're right that we only make one annotation per transformation. I stopped tracing back after applyD = act.DAct and assumed DAct was a node. Egg on my face.

I'm pretty sure that the vector-space linear maps do result in pretty performant code. I just wonder if a different representation might be even better, in particular for this use case. But at any rate it's not a huge bottleneck. Profiling indicates the major bottleneck in Poster.hs is now colour-related. I note there's already something on Trello about this. Fixing that, and maybe Envelopes, should be a higher priority for me I think.

I agree this is exactly the sort of situation for which RULES exist. My objection is mostly that it's often easy for the end-user to get into cases where a RULE should exist but doesn't, or the current RULE doesn't fire. But if adding a RULE fixes this use-case I definitely think we should do so (my previous effort was incorrect BTW because it doesn't reproduce the empty transform). Fortunately one rule for iterate (transform x) is probably sufficient to get many common occurrences.

I'm not really familiar with HERMIT, but I've heard good reports. I'll have a look (that shouldn't stop anyone else either!).

diagrams / diagrams-contrib

primeLayout: rewrite to avoid iterated transformations #22