Open WolframKahl opened 8 years ago
I thought that GHC automatically inlines small, simple functions. Perhaps it would be worthwhile investigating why inlining is not happening automatically in this case.
Some time ago, I did some experiments: After some exploration, I arrived at the following patch:
diff --git a/src/full/Agda/Compiler/ToTreeless.hs b/src/full/Agda/Compiler/ToTreeless.hs
index c68e227..2fcfbe2 100644
--- a/src/full/Agda/Compiler/ToTreeless.hs
+++ b/src/full/Agda/Compiler/ToTreeless.hs
@@ -328,8 +328,11 @@ maybeInlineDef q vs =
ifM (lift $ alwaysInline q) doinline $ do
def <- lift $ getConstInfo q
case theDef def of
- Function{ funInline = inline }
+ def'@Function{ funInline = inline }
| inline -> doinline
+ | isProperProjection def' -> do
+ lift $ reportSDoc "treeless.inline" 20 $ text "-- inlining projection" $$ prettyPure (defName def)
+ doinline
| otherwise -> do
_ <- lift $ toTreeless' q
used <- lift $ getCompiledArgUse q
The result is that the generated Haskell code explodes, with code of the following pattern (simplified for readability):
... = case v3 of
Constr1 v10 v11 v12 v13 v14 v15 -> case vX of
... -> ...
... -> case v3 of
Constr1 v50 v51 v52 v53 v54 v55 -> result
(With default cases and coe
calls liberally sprinkled in-between.)
I thought about adding a “case folding” pass to ToTreeless
, but that would still generate those huge intermediate TTerm
s first.
I currently think that the proper way would be something along the following lines:
TTerm
s only in an additional ReaderT
that keeps an environment of case
arguments with matched patterns.TTerm
constructor, check whether it is a case
; if yes, look the argument up in the environment, and if it is found, prune all patterns that don't match the pattern associated with the argument in the environment, don't generate a new case, but return the local body replacing the local pattern variables with those from the environment.With informal substitution notation, the example above would then become:
... = case v3 of
Constr1 v10 v11 v12 v13 v14 v15 -> case vX of
... -> ...
... -> result[ v50 := v10 , v51 := v11 , v52 := v12 , v53 := v13 , v54 := v14 , v55 := v15 ]
@UlfNorell , @phile314: Does this make sense?
I think this would introduce valuable sharing that is currently lost. Even if GHC might be able to reconstruct it with extremely aggressive options (I tried quite a few things in the past, with no significant success), it would still be very expensive on the GHC-compilation side.
Unfortunately this proposal involves a lot more Agda internals than I am currently able to handle; I am not even confident that I could arrive at a way to set up that ReaderT
that stands a chance of getting accepted into the Agda code base...
I thought there was a pass in the treeless compiler that would get rid of repeated cases, but of course that still would have the problem with the huge intermediate terms. This might make a nice code sprint for next week.
Another option would be to not inline projection functions in ToTreless itself, but to add an additional pass inlining projection functions and making sure that we generate only one case expression per scrutinee (like wolfram outlined above).
@UlfNorell I also thought there is such a pass. Maybe it only works on consecutive cases?
For illustration, here is an example of what came out with that patch (I switched Haskell pretty-printing to PPInLine to avoid megabytes of spaces, and then manually indented):
name446 = "Relation.Binary.Poset.HasMeetProps.HasMeetProps.≤-from-∧₂";;
d446 v0 v1 v2 v3 v4 = case coe v0 of
{ MAlonzo.Code.Relation.Binary.C235 v5 v6 v7 v8 -> case coe v8 of
{ MAlonzo.Code.Relation.Binary.C217 v9 v10 -> case coe v9 of
{ MAlonzo.Code.Relation.Binary.C17 v11 v12 v13 -> coe v13 v3
(let { v14 = coe v1 v2 v3;} in case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 -> coe v15
; _ -> MAlonzo.RTE.mazUnreachableError
}
) v2 (case coe v0 of
{ MAlonzo.Code.Relation.Binary.C235 v14 v15 v16 v17 -> case coe v17 of
{ MAlonzo.Code.Relation.Binary.C217 v18 v19 -> case coe v18 of
{ MAlonzo.Code.Relation.Binary.C17 v20 v21 v22 -> coe v21 v3
(let { v23 = coe v1 v2 v3;} in case coe v23 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v24 v25 -> coe v24
; _ -> MAlonzo.RTE.mazUnreachableError
}
) (case coe v0 of
{ MAlonzo.Code.Relation.Binary.C235 v23 v24 v25 v26 -> case coe v26 of
{ MAlonzo.Code.Relation.Binary.C217 v27 v28 -> case coe v27 of
{ MAlonzo.Code.Relation.Binary.C17 v29 v30 v31 -> case coe v29 of
{ MAlonzo.Code.Relation.Binary.Core.C605 v32 v33 v34 -> coe v33
(let { v35 = coe v1 v2 v3;} in case coe v35 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v36 v37 -> coe v36
; _ -> MAlonzo.RTE.mazUnreachableError
}
) v3 v4
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> MAlonzo.RTE.mazUnreachableError
}
)
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> MAlonzo.RTE.mazUnreachableError
}) (let { v14 = coe v1 v2 v3;} in case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 -> case coe v16 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C1 v17 v18 v19 -> coe v17
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> MAlonzo.RTE.mazUnreachableError
})
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
};;
This contains 15 case
expressions. Applying the transformation outlined above (collapsing each of v0
, v8
, and v9
twice) gets this down to 9:
name446 = "Relation.Binary.Poset.HasMeetProps.HasMeetProps.≤-from-∧₂";;
d446 v0 v1 v2 v3 v4 = case coe v0 of
{ MAlonzo.Code.Relation.Binary.C235 v5 v6 v7 v8 -> case coe v8 of
{ MAlonzo.Code.Relation.Binary.C217 v9 v10 -> case coe v9 of
{ MAlonzo.Code.Relation.Binary.C17 v11 v12 v13 -> coe v13 v3
(let { v14 = coe v1 v2 v3;} in case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 -> coe v15
; _ -> MAlonzo.RTE.mazUnreachableError
}
) v2 (coe v12 v3
(let { v23 = coe v1 v2 v3;} in case coe v23 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v24 v25 -> coe v24
; _ -> MAlonzo.RTE.mazUnreachableError
}
) (case coe v11 of
{ MAlonzo.Code.Relation.Binary.Core.C605 v32 v33 v34 -> coe v33
(let { v35 = coe v1 v2 v3;} in case coe v35 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v36 v37 -> coe v36
; _ -> MAlonzo.RTE.mazUnreachableError
}
) v3 v4
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
)
) (let { v14 = coe v1 v2 v3;} in case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 -> case coe v16 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C1 v17 v18 v19 -> coe v17
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> MAlonzo.RTE.mazUnreachableError
})
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
};;
Notice that coe v1 v2 v3
occured already originally four times; if we lift this let
just out one level, we can share it:
name446 = "Relation.Binary.Poset.HasMeetProps.HasMeetProps.≤-from-∧₂";;
d446 v0 v1 v2 v3 v4 = case coe v0 of
{ MAlonzo.Code.Relation.Binary.C235 v5 v6 v7 v8 -> case coe v8 of
{ MAlonzo.Code.Relation.Binary.C217 v9 v10 -> case coe v9 of
{ MAlonzo.Code.Relation.Binary.C17 v11 v12 v13 -> let { v14 = coe v1 v2 v3;} in coe v13 v3
(case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 -> coe v15
; _ -> MAlonzo.RTE.mazUnreachableError
}
) v2 (coe v12 v3
(case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v24 v25 -> coe v24
; _ -> MAlonzo.RTE.mazUnreachableError
}
) (case coe v11 of
{ MAlonzo.Code.Relation.Binary.Core.C605 v32 v33 v34 -> coe v33
(case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v36 v37 -> coe v36
; _ -> MAlonzo.RTE.mazUnreachableError
}
) v3 v4
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
)
) (case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 -> case coe v16 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C1 v17 v18 v19 -> coe v17
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> MAlonzo.RTE.mazUnreachableError
})
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
};;
Sharing also the associated case coe v14
changes the evaluation order, but produces something that looks like acceptable compiler output :wink: :
name446 = "Relation.Binary.Poset.HasMeetProps.HasMeetProps.≤-from-∧₂";;
d446 v0 v1 v2 v3 v4 = case coe v0 of
{ MAlonzo.Code.Relation.Binary.C235 v5 v6 v7 v8 -> case coe v8 of
{ MAlonzo.Code.Relation.Binary.C217 v9 v10 -> case coe v9 of
{ MAlonzo.Code.Relation.Binary.C17 v11 v12 v13 -> let { v14 = coe v1 v2 v3;} in case coe v14 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 -> coe v13 v3
(coe v15) v2 (coe v12 v3
(coe v15) (case coe v11 of
{ MAlonzo.Code.Relation.Binary.Core.C605 v32 v33 v34 -> coe v33 (coe v15) v3 v4
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
)
) (case coe v16 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C1 v17 v18 v19 -> coe v17
; _ -> coe MAlonzo.RTE.mazUnreachableError
})
; _ -> MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
};;
To avoid changing the evaluation order, one could generate a pattern let
, eliminating v14
:
name446 = "Relation.Binary.Poset.HasMeetProps.HasMeetProps.≤-from-∧₂";;
d446 v0 v1 v2 v3 v4 = case coe v0 of
{ MAlonzo.Code.Relation.Binary.C235 v5 v6 v7 v8 -> case coe v8 of
{ MAlonzo.Code.Relation.Binary.C217 v9 v10 -> case coe v9 of
{ MAlonzo.Code.Relation.Binary.C17 v11 v12 v13 -> let
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C13 v15 v16 = coe v1 v2 v3;
} in coe v13 v3
(coe v15) v2 (coe v12 v3
(coe v15) (case coe v11 of
{ MAlonzo.Code.Relation.Binary.Core.C605 v32 v33 v34 -> coe v33 (coe v15) v3 v4
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
)
) (case coe v16 of
{ MAlonzo.Code.Relation.Binary.Poset.Meet.C1 v17 v18 v19 -> coe v17
; _ -> coe MAlonzo.RTE.mazUnreachableError
})
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
}
; _ -> coe MAlonzo.RTE.mazUnreachableError
};;
Could this have any negative consequences, like for example causing space leaks? I haven't yet looked at this in detail, but it might be worth checking this before implementing. GHC's literature discusses this to some extent:
One factor coming in here to make a difference is that Agda has a module system. :wink:
Another important difference is that Haskell datatypes are by default coinductive. (All I am arguing about here is about constructors for (at most) inductive datatypes. One might even make a difference between non-recursive records and inductive recursive records. Constructors for coinductive records in Agda will definitely need to be treated differently.)
The pattern let
of the last version above will evaluate coe v1 v2 v3
only once, and keep the fields v15
and v16
alive until their last use. If v13
ends up not using some of its arguments, this could mean that v15
and/or v16
are kept alive until the application of v13
is fully evaluated.
Without the let-floating, coe v1 v2 v3
will be evaluated (if needed by v13
) four times.
The trimmed-down source for the example above is:
module Relation.Binary.Poset.LeqFromMeet2 where
open import Level
open import Relation.Binary using (Poset)
open import Relation.Binary.Poset.Meet using (module PosetMeet)
module HasMeetProps {j k₁ k₂ : Level} (𝒫 : Poset j k₁ k₂)
(meet : (R S : Poset.Carrier 𝒫) → PosetMeet.Meet 𝒫 R S) where
open Poset 𝒫
open PosetMeet 𝒫
infixr 7 _∧_
_∧_ : Carrier → Carrier → Carrier
R ∧ S = Meet.value (meet R S)
≤-from-∧₂ : {R S : Carrier} → R ∧ S ≈ S → S ≤ R
≤-from-∧₂ {R} {S} = ≤-from-Meet₂ (meet R S)
The situation is quite typical: v0
is a nested record value (here, 𝒫 : Poset
), of which v8
and v9
are constituent records of which some fields are used (most likely via open public
in the enclosing record
).
v1
is the meet function; it is here applied to v2
(R
) and v3
(S
), producing the meet value v15
and the IsMeet
proof for it in v16
. Just the calculation of the value might already be quite expensive...
(The fact that this example is about “proofs” does actually not make a difference: These proofs here are, for me, in general not irrelevant, since I sometimes need them for totality proofs from which I then extract total functions.)
In many practical examples, the role of the v0
here is going to be taken by some module instantiation vMod vParam1 vParam2 ... vParam17
which might perform non-trivial computations at evaluation time, and definitely needs to be shared...
Natalie Perna and I are currently working on this in https://github.com/natalieperna/agda/tree/stable-2.5-inline-projections .
I only took a quick glance at your branch, but if I understood correctly you inline projections during the Internal -> Treeless
translation. In a second step, you get rid of all the duplicated case
expressions again and float them out. Wouldn't it be simpler to just not generate that many case
expressions in the first place? E.g. following your suggestion in https://github.com/agda/agda/issues/1895#issuecomment-209996981 ?
After learning about
{-# INLINE #-}
from the ChangeLog and seeing at the top of my profile output:, I added
INLINE
pragmas in the standard library forData.Product.Σ.proj₂
andData.Product.Σ.proj₁
, and obtained a 16% speed-up.Close to the top of the list in my profiling output, there are quite a few more record fields ─ shouldn't they all be inlined automatically?
Summary of the attached file ─ the important lines are the
runtime
of the compiled program (without profiling):2016-03-05_profiling_log.txt