Make rewriting less difficult

msaaltink commented 3 years ago

I have a fairly complex proof that will involve lots of rewriting, and am encountering some issues. Here's a stripped-down example illustrating one of them.

In the context of this Cryptol

type F = [128] // flat
type S = [4][32] // structured

to_f: S -> F
to_f x = join x

to_s: F -> S
to_s x = split x

f1: F -> F
f1 x = undefined
f2: F -> F
f2 x = undefined

f x = f1 (f2 x)

f1_s x = f1 (to_f x)
f2_s x = to_s (f2 x)

f_s x = f1_s (f2_s x)

we have the obvious fact to_f (to_s x) == x, but in my real example that fact is less obvious. So I want to prove it once, then apply it in proofs where needed. Here's what should be a simple proof: show f_s x == f x. That's easy: unfold f_s and f to get f1 (to_f (to_s (f2 x)) == f1 (f2 x), then apply the rewrite, and it is trivial. (Again, you need to imagine a more complex formula where the rewrite might apply several times over, and we really do not want to repeat the proof of the fact inline.)

This plan fares poorly in SAW:

prove_print
do {
  let unints = ["f1", "f2", "to_f", "to_s"];
  goal_eval_unint unints;
  simplify (addsimp to_f_to_s_thm (cryptol_ss ()));
  w4_unint_z3 unints;
  }
{{ \x -> f_s x == f x }};

fails, and if we put in a print_goal command after the simplify step we see

let { x@1 = Prelude.Vec 32 Prelude.Bool
      x@2 = f2 var_x0_60
      x@3 = to_s x@2
    }
 in Prelude.EqTrue
      (Prelude.bvEq 128
         (f1
            (to_f
               [Prelude.at 4 x@1 x@3 0
               ,Prelude.at 4 x@1 x@3 1
               ,Prelude.at 4 x@1 x@3 2
               ,Prelude.at 4 x@1 x@3 3]))
         (f1 x@2))

where the result of to_s has been split into 4 parts and then reassembled. Why?

We can try to undo the splitting and reassembly with a rewrite. The naive attempt with

rejoin_thm <- prove_print z3 {{ \ (x:[4][32]) -> [x@0, x@1, x@2, x@3] == x }};

fails, and printing out the theorem shows that the pattern looks like

         [Prelude.ite x@1 (Prelude.intLe x@2 x@2)
            (Prelude.at 4 x@1 x (Prelude.intToNat x@2))
            x@7
         ,Prelude.ite x@1 (Prelude.intLe x@2 x@3)
            (Prelude.at 4 x@1 x (Prelude.intToNat x@3))
            x@7
         ,Prelude.ite x@1 (Prelude.intLe x@2 x@5)
            (Prelude.at 4 x@1 x (Prelude.intToNat x@5))
            x@7
         ,Prelude.ite x@1 (Prelude.intLe x@2 x@6)
            (Prelude.at 4 x@1 x (Prelude.intToNat x@6))
            x@7]

which will not match the formula we are trying to prove. Success can be obtained with this recipe (which Andrei was good enough to show me):

let at0 = parse_core "\\ (x:(Vec 4 (Vec 32 Bool))) -> at 4 (Vec 32 Bool) x 0";
let at1 = parse_core "\\ (x:(Vec 4 (Vec 32 Bool))) -> at 4 (Vec 32 Bool) x 1";
let at2 = parse_core "\\ (x:(Vec 4 (Vec 32 Bool))) -> at 4 (Vec 32 Bool) x 2";
let at3 = parse_core "\\ (x:(Vec 4 (Vec 32 Bool))) -> at 4 (Vec 32 Bool) x 3";

rejoin_thm' <- prove_print z3 (rewrite (cryptol_ss ()){{ \(x:S) -> [at0 x, at1 x, at2 x, at3 x] == x }});

so that the proof script

let unints = ["f1", "f2", "to_f", "to_s"];
goal_eval_unint unints;
simplify (addsimps [to_f_to_s_thm, unrip_thm'] (cryptol_ss ()));
print_goal;
w4_unint_z3 unints;

works.

A few thoughts on this:

We are lucky that the structure only has 4 elements; if it had, say, 64, it would take a ridiculous amount of SAWscript just to write the rejoin rule
The use of parse_core is brittle; saw crashes if there are errors in the SAWCore string or if any of the many unsupported features appear.
The documentation is not much help in dealing with this problem. What do SAWcore terms look like? What are the typing rules? What is the translation from Cryptol to SAWcore?
I do not have a good mental model of how I can use the names of SAWcore terms in Cryptol term. Do these have Cryptol types? Can I see the types? Can all SAWcore terms be used like this?
It is not at all clear why the chopping and reassembly happened in the first place. Is there some way to suppress it?
Finally, although I could not make up a small example, I have this chopping and reassembly interacting with conditionals in a way that is very difficult to handle; the ite has been pushed inside the sequence, so that the rejoin rule does not match, and it is difficult to find a work-around. SAW has also put the type bitvector 32 into the formula, which I cannot get from a Cryptol term, so I need to write very large SAWcore formulas to get rewrite rules that will match the terms that SAW creates.

In summary, this seems much too complicated for what should be a very simple proof. Yes, in this case I can just unfold to_f and to_s to get a proof, but in the real problem I am working on this is not a good option.

brianhuffman commented 3 years ago

As you can probably tell, the goal_eval/goal_eval_unint commands are a bit unpolished. I originally implemented them as a quick hack to help in a particular proof I was working on, so it's useful to get some feedback from the experience of other people trying to use them for other things.

They work by round-tripping from saw-core to the what4 library and back: First, we evaluate the SAW term in the what4 backend, just like we'd do when sending a goal to a solver, but then we use the SAW backend of what4 to convert it back to a SAW term again. This works, but one of the limitations is that while Cryptol and SAW both have fixed-length sequence types, what4 does not. This means that an uninterpreted SAW/Cryptol function with a type like [4][32] -> [128] needs to be encoded in what4 as a function of type [32] -> [32] -> [32] -> [32] -> [128]; when it gets translated back to SAW its arguments are collected in a sequence. Similarly a variable x of type [4][32] needs to be expanded to four what4 variables each of type [32], which map back to the saw-core terms at 4 (Vec 32 Bool) x n for n = 0 to 3.

So how can we fix this? One easy thing to do would be to give goal_eval a post-processing pass to turn terms like [at x 0, at x 1, at x 2, at x 3] back into x. However, conditionals would interfere, as you point out.

Another approach we could try is to use what4's primitive tuple/struct types. Currently we're not using them at all. But we could actually translate a variable with a type like [4][32] into a single what4 variable of type ([32],[32],[32],[32]); then we wouldn't have to do the same exploding/reassembling process we currently do that causes all the trouble. However, this might not be trivial to implement, because the saw-core evaluator is not really designed to work with backends that support vector types like this. I'll look into it, though.

brianhuffman commented 3 years ago

About the presence of bitvector 32: The saw-core function bitvector is defined as bitvector n = Vec n Bool, so for saw-core typechecking purposes, bitvector 32 and Vec 32 Bool are interchangeable. However, for rewriting purposes they are distinct. Since it can cause problems for rewriting, I think maybe it's a good idea for us to completely remove the bitvector type synonym from saw-core, and just use Vec n Bool everywhere.

GaloisInc / saw-script

Make rewriting less difficult #863