Open myreen opened 8 years ago
A note about normalise
: it must not split nested App
, i.e.
App exp1 (App exp2 exp3)
should be transformed into something like:
let t3 = exp3 in
let t2 = exp2 in
let t1 = exp1 in
App exp1 (App exp2 exp3)
This is important so that cf
can treat it as a multi-app which makes reasoning in CF must neater.
I'd like to volunteer for one of the remaining tasks, if it's not done by the time I get a chance to look into it (some time in the middle of this week probably).
The remaining tasks depend on each other. I'll assign you to this to "define and verify normalise
".
cf
already handles nested let
-expressions, even though it's probably not super convenient to manipulate them at the moment.
I've been thinking about the definition of normalise
. For most part, it'll be pretty straightforward taking as input expressions such as e1 + e2
and turning them into:
let t1 = normalise e1 in
let t2 = normalise e2 in
t1 + t2
In case one of normalise e1
or normalise e2
contain a closure creation, then the one containing the closure creation goes first. In other words, sometimes the above will become:
let t1 = normalise e2 in
let t2 = normalise e1 in
t2 + t1
The interesting case if when both contain closure creation, and there can of course be more than two expressions. We might have to deal with n
expressions with closure creation inside of them.
Example: what would be a-normal form version of the following look like?
f (fn x => x+1) (fn x => x+2) ... (fn x => x+n)
This is a problem because we can't pick one to go before the others, since the evaluation environment stored in the closure value changes:
let t1 = Fun ... in
let t2 = /// nothing fits here /// in ...
f t0 t1 ... tn
One could solve this problem with a tuple and extend cf
to accept this one type of input that isn't in a-normal form. The tupled version for the example above could be:
case (..., ..., ..., ...) of (t1,t2,...,tn) => f t0 t1 ... tn
@Armael: Could cf
be made to support such a form of parallel assignment? I'm hoping that something of the following form could be supported in the cf
function.
cf (Man (Op Cons xs) [(pat,body)]) =
case dest_var_pat pat of
| SOME vars => if LENGTH vars <> LENGTH xs then cf_bottom else
/// something here in terms of cf applied as if they are single lets ///
| NONE => cf_bottom
I suspect this is tricky. However, I believe some form of parallel assignment is necessary for normalise
to be possible to write so that it always produces something cf
can use.
Note 1: It's important to remember that normalise
is only provided as a convenience for the user. One could argue that it doesn't need to successfully turn all expressions into a-normal form; in fact, there are no formal constraints forcing it to do so, instead when it fails the user gets an unprovable cf
-goal. In documentation we can say that the normalisation used by cf
does not normalise expressions with more than one Fun
or Letrec
not in a-normal position simultaneously (and give examples).
Note 2: This problem would go away immediately if the semantics for closure creation (both Fun
and Letrec
) filtered the env
for the variables that appear in the body of the closure. I'd like to point out that such a change to the source semantics would probably cause significant proof breakage, in particular if it would need to be propagated into the compiler's intermediate languages. The translator would immediately become slower because it would have to either EVAL a semantic-free-vars function or keep track of the free variables as it goes. Both would cause it to go noticeably slower for large translations, such as the bootstrap.
My opinion: I'm happy to have normalise
fail to produce a-normal forms for easily avoided expressions such as foo (fn ...) .... (fn ...)
, and I think normalise
would still be very useful in practice. If we do, in the future, change envs to get filtered, then we can easily update the not-always-successful version of normalise
to one that always succeeds.
How should normalise
pick names for the intermediate bindings? I suppose some function to calculate free variables will be necessary to avoid picking a name clash. I also suppose this function does not already exist. Am I wrong?
What should be the result of normalise
on the following?
case f (x + y) of
INL a => (fn x => a + x + y)
| INR a => a + x
We require f (x + y)
to be bound as a temporary variable, but that binding will mess up the INL
case's closure.
I think this shows the problem @myreen mentioned above is worse than described: it can happen even if there's only one closure-forming expression around. Should pattern matches resulting in closure-forming expressions also cause normalise
to fail?
Regarding names: the names you pick for let
bindings won't currently appear in the cf
proofs, instead the tactics force the user to pick names during the proof. The names chosen by normalise
must of course not clash with the names used in the subexpressions. I don't know of a function that computes free variables for source expressions. @SOwens / @tanyongkiam, is there such a function?
@xrchz I would have thought that your example should produce the following:
case (let t1 = x + y in f t1) of
INL a => (fn x => a + x + y)
| INR a => a + x
However, inspecting cf_def
and cf_match_def
, I fear that cf
returns cf_bottom
for this code. @Armael, could cf
be made to allow let
-expression as above? Could
cf_match e (MAP (\b. (FST b, cf p (SND b))) branches
be replaced by something like
cf_match (cf p e) (MAP (\b. (FST b, cf p (SND b))) branches
and cf_match
be updated accordingly?
The same change would be required for Handle
, of course.
I think cf_def always produces cf_bottom
for all Handle
expressions. My impression is that cf
is not able to deal with exceptions currently.
Looking back at the cf for functions (cf_fun
), I think issue (2) might just go away?
A program of the form:
let f = fn x => f_body x in
foo
gets translated to cf_fun p f [x] (cf f_body) (cf foo)
.
Where cf_fun
(simplified for the one parameter case) is:
val cf_fun_def = Define `
cf_fun (p:'ffi ffi_proj) f n F1 F2 = \env. local (\H Q.
!fv.
(!xv H' Q'.
F1 (extend_env [n] [xv] env) H' Q' ==>
app (p:'ffi ffi_proj) fv [xv] H' Q')
==>
F2 (env with v := (f, fv)::env.v) H Q)`
cf_fun
abstracts over the closure; in the rest of the program, f
will be bound to fv
, a fresh name introduced by !fv. …
. The only thing we get about fv
is the assumption !xv H' Q'. cf f_body (extend_env [x] [xv] env) H' Q' ==> app fv [xv] H' Q'
. Then I guess we'll need to prove that this assumption still holds when the normalisation function adds bindings to env
, which is true as the newly bound variables won't appear in f_body
.
I hope this makes sense?
I don't understand what you wrote. Could you elaborate using an example? Here's an example:
foo (fn x => x+1) (fn x => x+2)
What should normalise
do to make cf
happy with the input?
I was thinking of having normalise
turn
foo (fn x => x+1) (fn x => x+2)
into
let f1 = fn x => x+1 in
let f2 = fn x => x+2 in
foo f1 f2
I confused myself a bit, so the whole idea I had probably doesn't work; however I think the following remarks are still valid:
normalise
doing that, ∀env s exp res. evaluate F env s (normalise exp) res = evaluate F env s exp res
does not hold because of closures environments.cf
itself can be proved to be agnostic wrt adding extra bindings before defining a function, if these bindings do not appear in the body of the function. (this is what I was trying to convince myself in the previous message)∀p e. sound p e (cf p (normalise e))
should still be true?Maybe we can have normalise
produce "parallel assignments" like you were mentionning (so that evaluate (normalise exp) = evaluate exp
actually holds), and then have a second normalise function that turns this parallel assignments into successive ones (that should be easier to manipulate), and prove that cf (normalise2 (normalise exp)) env H Q <=> cf (normalise exp) env H Q
.
I'm pretty sure the following won't work:
let f1 = fn x => x+1 in
let f2 = fn x => x+2 in
foo f1 f2
as a normalisation of foo (fn x => x+1) (fn x => x+2)
, since the value returned by foo
could be its second argument. We could have defined foo
as:
fun foo x1 x2 = x2
and the postcondition we want to prove for foo (fn x => x+1) (fn x => x+2)
might be that it returns exactly \v. & (v = Closure auto_env45 "x" (x+2))
. We are unable to prevent the user from instantiating the postcondition with something of this style.
Backing up a bit: In order to prove:
∀p e. sound p e (cf p (normalise e))
We start by showing:
∀p e. sound p e (cf p e)
Then instantiate e
with normalise e
∀p e. sound p (normalise e) (cf p (normalise e))
Using sound p (normalise e) h = sound p e h
, we get the desired theorem. Note that this last rewrite is only true if we can prove:
∀env s exp res. evaluate F env s (normalise exp) res = evaluate F env s exp res
Having some form of parallel assignment accepted by cf
would allow us to make normalise
always return something that can be used.
I just want to reiterate that I don't see the planned restriction on normalise
as a major issue. I'm happy with normalise
being useful out-of-the-box in only 99 % of real use cases, where the user doesn't have more than one Fun
/Letrec
expressions without a Let
in between.
and the postcondition we want to prove for foo (fn x => x+1) (fn x => x+2) might be that it returns exactly \v. & (v = Closure auto_env45 "x" (x+2)). We are unable to prevent the user from instantiating the postcondition with something of this style.
What I think I realised is that you cannot ever prove such a specification with CF, as cf_fun
abstracts over the actual implementation of function values.
cf_fun
does not provide the user the actual closure value, but a fresh name (fv
), plus a specification for fv
(app fv ...
) if they can prove the cf for the function body.
I'm starting to wonder whether it would simply be easier to just make cf
call itself recursively more frequently. Yes, the inductive hypothesis would be used a lot more, but the input would no longer need to be in a-normal form and we could completely avoid messing with normalise
and its restrictions.
Is there something fundamental that requires cf
to need input in a-normal form?
@Armael and I chatted over a Hangout and came to the conclusion that it's very difficult to address examples such as foo (fn x => x+1) (fn x => x+2)
. We decided to stick with the current approach to normalise
which already worked for Let _ (Fun _ _) _
and is likely to work for Letrec
. The current approach will produce cf_bottom
for foo (fn x => x+1) (fn x => x+2)
, but the user can easily avoid such goals by manually editing the input to Let
-bind one of the closures.
finish proof of Letrec case for new recursion through
normalise
Done! (as of e28e6af6f0fb1a9d8f3b93c981ddefa62211aa48 + b89dbfd)
An updated What is left to do list:
Some minor notes on normalise
. I will not be looking at it for a while, so other volunteers are still welcome.
What should be the result of normalise
on the following?
(Suppose Node
is a constructor taking two arguments.)
Node (f x) (r := 5; (fn x => !r + x))
Answer:
let val t0 = (let val t0 = fn x => !r + x
val t1 = r := 5
in t0 end)
val t1 = f x
in Node t1 t0 end
Idea:
Chatting with @myreen today, we agreed that the right way to address this issue is for the CF framework to be extended to support compound expressions directly (i.e., not require normalisation). This could be done by making a cf that works as if the program had a let in it, but without changing the program: it instead operates on the first sub-expression and binds the result to a new fresh dummy variable (visible only in the CF environment).
I actually tried this a couple weeks ago on a toy CF framework in Coq, and essentially couldn't manage to prove the soundness theorem. So I don't know how workable this approach would be.
What goes wrong? Maybe it could work in the CakeML version anyway?
I don't know if the difficulties were specific to the toy framework in Coq. It's a bit hard to explain (and remember) without digging up the proof script of that time, but as far as I remember, the issue was the following: when encountering a sub-expression, the CF for let is generated -- as if a let binding was introduced in the program, but without actually doing that. The CF for let (without exception handling and local) is as follows:
cf_let n F1 F2 H Q =
?Q'. (F1 env H Q') /\
(!xv. F2 (env with <| v := (n, xv) :: env.v |>) (Q' xv) Q)
where F1 and F2 are respectively characteristic formulae for the sub-expression that is let-bound (in the CF) and the rest. This CF for let introduces a binding at the logical level (xv
) which corresponds to the name of the deep embedding n
added to the environment. F2 now appears under this binder, even though it does not depend on it.
Now, in there are two sub-expressions to successively let-bind in the CF, one will need to show in the soundness proof that the CF for the second let-binding does not depend on the binding introduced by the first one. This is true about names in the deep embedding if we chose fresh names; but one also needs to show that at the logical level because of the corresponding HOL binders (!xv
) that have been introduced...
I did not know how to do that, but maybe I missed something. I will try to dig up this example when I'm back from vacation.
Commits 0232954 and afaf2d4 started integrating a
normalise
function incf
in order to relax the requirement that input tocf
must be in a-normal form.The approach is to mostly leave
cf
as is, but require that the user applies:when proving something about
exp
. In order to prove soundness ofcf p (normalise ...)
i.e.we need to have the following property for
normalise
:For this to be true,
normalise
must not alter the closure values thatexp
produces. More specifically:(1) the code inside the closures, and (2) the environments stored in the closure values must be the same after
normalise
.Addressing (1)
Armaël and I worked out that we can (probably) address (1) by writing
normalise
such that it leavesFun
unchanged:but also make
cf
add a newnormalise
when recursing into the body of a lambda, i.e.With this approach, users can write code
exp
in any form and expectcf p (normalise exp)
to give good formulas even for code insideFun
.Addressing (2)
In order for
normalise
not to alter closure environments, it must not introduce new bindings in the environments, but new bindings is exactly what a transformation into a-normal form does...Example:
turns into the following with a standard a-normal form transformation:
The problem is that the closure environment produced by the new code introduces two new bindings into the closure environment.
The solution to this could be to transform programs into something that is almost a-normal form:
This format very quickly becomes unreadable, but readability is not important since the user will only want to look at what comes out of
cf p (normalise exp)
for anyexp
.What is left to do:
Letrec
case for new recursion throughnormalise
-- @Armaelcf
support nestedlet
-expressions as above -- @Armaelnormalise
(currently a dummy definition is used) -- volunteers?cf
work with recursion throughnormalise
-- volunteers?