Open nikswamy opened 5 years ago
I expect @Chris-Hawblitzel @protz @karthikbhargavan @s-zanella @aseemr to be interested. This has stemmed from several conversations with them.
I like the gist of this proposal. My main concern is that because using the module system to enforce confidentiality is not as strict as using abstraction, there might remain loopholes that use the now transparent logical reasoning on secret integers to break confidentiality. You probably thought this through, but I'd like to hear your arguments why this is fine, and we'd need to validate it in experiments.
A minor observation about Lib.uintv_extensionality
:
I removed the pattern in uintv_extensionality
in hacl-star@_dev
(see https://github.com/project-everest/hacl-star/blob/dev/lib/Lib.IntTypes.fsti#L169) after realizing it didn't instantiate as expected and discussing this with @nikswamy and @aseemr. See the discussion in https://github.com/FStarLang/FStar/issues/1660. I replaced uintv_extensionality
withuintv_injective
, which is analogous to uv_inv
for machine integers and has a usable pattern.
This commit restored the pattern in uintv_extensionality
, probably by mistake.
Santiago's point about abstraction is interesting. It is possible to coerce the private
types to their representation:
A.fst:
module A
private type t' = int
type t : eqtype = t'
B.fst:
module B
open A
let coerce (x:t) : y:int{y == x} = x
But I don't know about the requirements/usage of hacl secret int types, and so can't say if this is ok.
Some features it would be nice to have (from the F* meeting call):
FStar.Seq.Base/FStar.Seq.Properties
.This definitely seems like a positive direction for integer encodings!
One question about Tot
vs. Pure
: Is your recommendation that we never use refinements on Tot
function arguments (or returns)? E.g., we should never write something like:
let foo (x:int {P x}) : int
Instead we should use:
let foo (x:int) : Pure int (requires P x) (ensures fun x -> True)
? Or is that recommendation somehow specific to Integers?
Hi @parno, sorry ... I didn't reply to your comment. My comment about style is not specific to integers. In general, when applying a dependent function to an effectful argument, the style of specification with requires/ensures, if applicable, will be smoother for type inference.
Also, we discussed this at an F* meeting, but it didn't get transcribed here.
In response to @aseemr and @s-zanella's questions about abstraction.
Indeed, this example does break abstraction:
A.fst:
module A
private type t' = int
type t : eqtype = t'
B.fst:
module B
open A
let coerce (x:t) : y:int{y == x} = x
However, my proposal involves adding a new inductive type for t
, rather than just a type alias. i.e., something like
A.fst:
module A
[@unfold_for_smt]
private type t' = | Mk: v:int -> t'
type t : eqtype = t'
So, t
and int
are distinct types that can't be coerced to one another.
I've spend a lot of time working on this over the summer. My current state is in nik_eager_unfolding
. Fragments of my work have landed in F* master, but the bulk of it has not. I'm writing to provide an update of where things stand and to explain some of the difficulties I've faced. In summary, I think I have to backtrack on some of my proposals.
Recall that one of my main proposals was to make F* machine int types (FStar.[U]Int[N]
) and other abstractions on top of it (notably, HACL's Lib.IntTypes
) less abstract, so as to make them more amenable to normalization and efficient SMT encoding.
The mechanism to achieve this is a new unfold_for_smt
qualifier, which allows unfolding definitions for SMT encoding by default, but leaves the use of a defined symbol in place otherwise (unless unfolding is explicitly requested, e.g., by a call to the normalize_term
, or norm
with specific delta steps, etc.)
I started by applying this qualifier to the machine int types. For example, I have:
module FStar.UInt16
unfold let n = 16
...
[@unfold_for_smt]
private
type t' =
| Mk: v:uint_t n -> t'
let t : eqtype = t'
While this allowed me to remove special-treatment for machine ints in the normalizer, and also to get more compact SMT encodings for machine integer terms (I got an F* green on CI with it too), I realize now that my proposal overlooked some (familiar) subtleties with extraction.
The machine int types have special status in extraction, both in the OCaml and KreMLin backends. In particular, say, FStar.UInt32.t is extracted to a different type that Prims.int; its operations, e.g., +^
is extracted to a different operator than the +
on Prims.int used in its definition; etc. The trouble with revealing the model of machine integers on top of Prims.int is that during extraction, the two types can get confused. For example, here's a bit of code distilled from an example in HACL (Spec.AES.fst)
module U16 = FStar.UInt16
let test (x:U16.t) = normalize_term U16.(x -^ x)
When extracted, it shows up as
let (test : FStar_UInt16.t -> FStar_UInt16.t) =
fun x -> FStar_UInt16.Mk ((FStar_UInt16.v x) - (FStar_UInt16.v x))
This is clearly not the intended behavior. Even if it is, it is incompatible with the choice of representation of machine int types in our compiler. Rather than implementing U16.t natively, this generated code requires a constructed type on top of Prims.int.
One could imagine implementing various compiler optimizations that transforms code like this back to something like U16.sub x x
, but such optimizations will be fragile. Indeed, I tried to add various things like this. For example, the strict_on_arguments
attribute (that was added to F* master recently), was motivated in part by this problem, where I was using it to prevent unfoldings in scenarios like the one above. I also had special treatment to eliminate the constructors of the machine int types during extraction. But, none of these really solve the problem and instead compound the complexity.
My current thinking basically reinforces a design guideline that we've adopted elsewhere in the compiler, including for the existing version of machine integers in master. Here's the guideline:
Every type that has native extraction support in the compiler (e.g., Prims.int, UInt16, Int32, Buffer, ConstBuffer, ...), must be abstract
Merge my fix to #1691: it is an orthogonal improvement and can land independently.
Abandon revising the representation of the machine integer types. These all have native extraction support and must remain abstract as they are now.
x +^ y +^ z
get encoding as uint_to_t (v x + v y + v z)
, i.e., we could extend the elim_box optimization in the SMT encoding to also include machine int types, eliminating the indirection in the SMT encoding which is one of the main things that this proposal was trying to address.[Optional] Improving Lib.IntTypes: In contrast to the machine integer types, Lib.IntTypes do not have any native support in the compiler. As such, they are a better candidate for the revised representation proposed here than the machine int types. (In hindsight, perhaps I should have started my work there, instead of starting with machine ints.)
That said, applying this proposal to Lib.IntTypes is also non-trivial:
To a first approximation, it would involve removing Lib.IntTypes.fsti and exposing its implementation, at least to the normalizer if not the SMT encoding.
To account for the extraction issues I neglected earlier, we may need a feature similar to Haskell's newtype (https://www.haskell.org/tutorial/moretypes.html). For instance, defining sec_int_t
for Lib.IntTypes would involve defining something like:
[@unfold_for_smt]
new //<--- similar to newtype in Haskell
private
type sec_int_t' t =
| Mk of pub_int_t t
In addition to what I've proposed so far, notice that I'm proposing marking sec_int_t'
as a new type
. Similar to Haskell's newtype
, this annotation would indicate to optimize the extraction of a single-constructor type by omitting its constructor (since, of course, sec_int_t must be represented at runtime as a public int).
It's embarrassing to only request this now, but, before attempting any of the optional steps above, we really ought to collect a suite of integer verification benchmarks (both micro and macro; using various styles e.g., proofs by normalization, SMT, canonicalization, calc, etc.; using various operations, linear, non-linear, bitwise; ...)
Pinging a few people who I hope may be interested in discussing this further: @Chris-Hawblitzel , @karthikbhargavan , @protz , @parno, @R1kM , @s-zanella, @ad-l, @polubelova , @kevinmkane, ...
Thanks for the summary Nik. Some thoughts:
type t = | Mk of u
as typedef t u;
)All in all, I'm relatively optimistic for Lib.IntTypes, provided we can get away without the new new type
construct (if OCaml performance is a problem, there are OCaml attributes we can add to this type to make sure it's unboxed).
Layers of integers in F*
SMT primitive integers (Z3's Int sort)
Mathematical integers in F* (Prims.int)
Machine integers are modeled in FStar.UInt[N].fst and FStar.Int[N].fst.
3 (a). We have an integer overloading module, FStar.Integers.fst that overloads integer operations over mathematical integers (Prims.int) and the machine integers.
3 (b). Further upstream in hacl*, we have Lib.IntTypes, which also overloads operations, but this time over machine integers only (not mathematical integers) and secret integers (all of whose operations are meant to be constant time).
Some problems with these libraries
There are several shortcomings of these libraries. I describe them here and my work in progress to address some of them.
Each layer adds a level of indirection in SMT reasoning
Mathematical integers in F* (Prims.int) are boxed when encoded to Z3.
0
in F* becomesBoxInt 0
a + b
in F* becomesBoxInt (+ (UnboxInt a) (UnboxInt b))
and so on.Machine integers are boxed mathematical integers
Secret integers are boxed machine integers
So, we're often forcing the solver to do arithmetic proofs through several layers of wrapping, perhaps needlessly so.
SMT Encoding options --smtencoding.elim_box --smtencoding.l_arith_repr and --smtencoding.nl_arith_reprt
These are getting increasingly widely used and control how integer terms are encoded to Z3. The latter two are integer specific, elim_box also impacts how booleans, reals, bit vectors etc. and other primitive types are encoded.
The trouble is that different modules use different settings and some of them are mutually incompatible (e.g., when using FStar.BV, elim_box true is essential, or else even simple equalities are not provable).
Further, different choices of these options in different parts of a proof lead to unpredictable SMT behavior.
Machine and secret integers are modeled abstractly, so reasoning about them requires the use of SMT lemmas rather than just computation in F* or theory reasoning in Z3
E.g., here's an excerpt from FStar.UInt32.fst
We force the SMT solver to use the
uv_inv
lemma for injectivity of machine integer construction.In HACL's Lib.IntTypes, the problem is worse, we have a similar lemma to
uv_inv
:But, this one is triggered on an equality
==
, which is really unreliable. F emits a warning. The only reason it works at all is because F sometimes wraps Prims.eq2 (==) when encoding to Z3, rather than emitting a primitive Z3 equality.No normalization support for some layers
Machine integer terms like
0ul +^ 1ul
only normalize because we added special support for them to the normalizer.Meanwhile, analogous terms in Lib.IntTypes do not reduce. I have seen people writing things like
normalize_term (size 0 +! size 1)
, but this does not reduce as one might expect.The integer overloading layer FStar.Integers fails to infer types properly in some cases
See issue #1691 and #1612
The type inference bug in #1612 was fixed.
A fix of #1691 is pending, but the fix strongly suggests a style of specification that favors using pre- and post-conditions rather than refined Tot functions. Perhaps this style should be applied more uniformly.
Towards fixing these problems
I am describing here my work in progress in the
nik_eager_unfolding
branch. (the name of the branch is incidental and poorly chosen)As a general goal, I would like to aim towards the following:
Each additional layer of arithmetic does not introduce an indirection in the SMT solver
Following the example of FStar.Integers, we should normalize arithmetic terms before they are sent to the SMT solver, removing needless abstraction. This is possible in FStar.Integers, since it transparently multiplexes several integer types rather than introducing a layer of abstract types.
To gain the same kind of normalization for machine integers, we need a few things:
Note, the representation of
t
is visible ast'
, butt'
itself isprivate
, meaning that no user-written client code can refer toMk
(the constructors oft'
), its discriminatorMk?
, nor its projectorMk?.v
. However, within F*, in both the normalizer and in the SMT encoding, the representation oft'
asMk v
is visible.Perhaps more prominently, type
t'
is markedunfold_for_smt
... bear with me for a moment. That's explained next.Now, I can define:
Compared to the existing definition of FStar.UInt32.add in master (https://github.com/FStarLang/FStar/blob/722c2e9972ecafd3b1a5993a1f58d83653b8cee1/ulib/FStar.UInt32.fst#L59), my definition above has a few salient features:
It is not
abstract
, so the definition is visible to the normalizer and SMT solver. We no longer need special support for machine integers in F*'s normalizerBecause its definition is visible, it has a trivial postcondition leading to smaller VCs. Note, I retain the
Pure
style of spec, rather than a refinedTot
function to not fall in the type inference traps of FStar.Integers mentioned above.And most importantly It is tagged with a new F* attribute
unfold_for_smt
. Intuitively, this new attribute is the analog ofinline_for_extraction
but for the SMT encoding. More precisely, a definition marked with this attribute will always be unfolded when a term is encoded to the SMT solver. But, the definition remains as is in other parts of the compiler, e.g., FStar.UInt32.add is not inlined at extraction.That is, every occurrence of
FStar.UInt32.add x y
is unfolded toMk (FStar.UInt.add (v x) (v y))
. Additionally, in my branch, I markFStar.UInt.add
asunfold
and so this is reduced further toMk (Prims.op_Addition (v x) (v y))
.Finally, I also define
Note, with this definition, the lemma
uv_inv
is no longer necessary, since the built-in encoding of datatypes in F*/SMT ensures thata.v
is injective.So, in the end,
FStar.UInt32.add x y
is encoded toMk (Prims.op_Addition (Mk?.v x) (Mk?.v y))
.This is almost what we want, but this has the additional encoding wrapper of
Mk
and the wrapping ofPrims.op_Addition
on top of Z3's primitive arithmetic operators.However, a term like
FStar.UInt32.add (FStar.UInt32.add x y) z
is encoded initially asMk (Prims.opAddition (Mk?.v (Mk (Prims.op_Addition (Mk?.v x) (Mk?.v y)))) (Mk?.v z)
but is reduced further toMk (Prims.opAddition (Prims.op_Addition (Mk?.v x) (Mk?.v y)) (Mk?.v z)
, i.e., the redundant unbox/box is removed.Note, we're still left with the encoding overhead of wrapping
Prims.int
by boxing Z3 integers ...Making
--smtencoding.elim_box true
and--smtencoding.l_arith_repr native
the default (and only setting for these options, i.e., removing the options)With these options, our term
Mk (Prims.opAddition (Prims.op_Addition (Mk?.v x) (Mk?.v y)) (Mk?.v z)
is reduced further to
Mk (BoxInt (+ (+ (UnboxInt (Mk?.v x)) (UnboxInt (Mk?.v y))) (UnboxInt (Mk?.v z))
and Z3 now sees a raw arithmetic expression over three Int-sorted terms, rather than several layers of wrapping on top of the raw terms.
Current status
I have this implemented for machine integers and passed F*'s basic regression suite with it. But, there's still quite a lot to do:
Pass regressions tests in Vale, Hacl, EverParse and miTLS
I started an everest run, but hit a normalization problem when verifying Vale code. Need to investigate further.
Revise FStar.Integers and close issue #1691 (easy)
Make elim_box true and l_arith_repr native the default
Revise Lib.IntTypes to use the same style as machine integers
Some benefits:
uintv_extensionality
can be removedA subtlety:
Lib.IntTypes has, so far, relied on abstraction to enforce confidentiality of its secret int types. Under my proposal, we will still rely on F*'s module system to enforce confidentiality, but instead of using
abstract
types, we'd useprivate
types. In a sense, my proposal is to make secret types abstract for implementations, but transparent for logical reasoning.Could we then simply move the revised Lib.IntTypes to ulib and use them throughout our code, rather than only in HACL? Perhaps that could (finally) be FStar.ConstantTime.Integers.
Figure out what to do about smtencoding.nl_arith_repr
This is the setting for the representation of non-linear arithmetic terms. I don't yet know what setting to prefer. F*'s default strategy is to use a kind of double wrapping, which seems to be overkill. Vale's choice for this setting is
"wrapped"
. That seems like a sensible default to settle on.