Revising machine integers, integer overloading, and secret integers

nikswamy commented 5 years ago

Layers of integers in F*

SMT primitive integers (Z3's Int sort)
Mathematical integers in F* (Prims.int)
Machine integers are modeled in FStar.UInt[N].fst and FStar.Int[N].fst.

3 (a). We have an integer overloading module, FStar.Integers.fst that overloads integer operations over mathematical integers (Prims.int) and the machine integers.

3 (b). Further upstream in hacl*, we have Lib.IntTypes, which also overloads operations, but this time over machine integers only (not mathematical integers) and secret integers (all of whose operations are meant to be constant time).

Some problems with these libraries

There are several shortcomings of these libraries. I describe them here and my work in progress to address some of them.

Each layer adds a level of indirection in SMT reasoning

Mathematical integers in F* (Prims.int) are boxed when encoded to Z3.

0 in F* becomes BoxInt 0
a + b in F* becomes BoxInt (+ (UnboxInt a) (UnboxInt b)) and so on.

Machine integers are boxed mathematical integers

Secret integers are boxed machine integers

So, we're often forcing the solver to do arithmetic proofs through several layers of wrapping, perhaps needlessly so.

SMT Encoding options --smtencoding.elim_box --smtencoding.l_arith_repr and --smtencoding.nl_arith_reprt

These are getting increasingly widely used and control how integer terms are encoded to Z3. The latter two are integer specific, elim_box also impacts how booleans, reals, bit vectors etc. and other primitive types are encoded.

The trouble is that different modules use different settings and some of them are mutually incompatible (e.g., when using FStar.BV, elim_box true is essential, or else even simple equalities are not provable).

Further, different choices of these options in different parts of a proof lead to unpredictable SMT behavior.

Machine and secret integers are modeled abstractly, so reasoning about them requires the use of SMT lemmas rather than just computation in F* or theory reasoning in Z3

E.g., here's an excerpt from FStar.UInt32.fst


abstract type t :Type0 =
  | Mk: v:uint_t n -> t

abstract
let v (x:t) : Tot (uint_t n) = x.v

abstract
let uint_to_t (x:uint_t n) : Pure t
  (requires True)
  (ensures (fun y -> v y = x)) = Mk x

let uv_inv (x : t) : Lemma
  (ensures (uint_to_t (v x) == x))
  [SMTPat (v x)] = ()

We force the SMT solver to use the uv_inv lemma for injectivity of machine integer construction.

In HACL's Lib.IntTypes, the problem is worse, we have a similar lemma to uv_inv:

val uintv_extensionality:
   #t:inttype
 -> #l:secrecy_level
 -> a:uint_t t l
 -> b:uint_t t l
 -> Lemma
  (requires uint_v #t #l a == uint_v #t #l b)
  (ensures  a == b)
 [SMTPat (uint_v #t #l a == uint_v #t #l b)]

But, this one is triggered on an equality ==, which is really unreliable. F emits a warning. The only reason it works at all is because F sometimes wraps Prims.eq2 (==) when encoding to Z3, rather than emitting a primitive Z3 equality.

No normalization support for some layers

Machine integer terms like 0ul +^ 1ul only normalize because we added special support for them to the normalizer.

Meanwhile, analogous terms in Lib.IntTypes do not reduce. I have seen people writing things like normalize_term (size 0 +! size 1), but this does not reduce as one might expect.

The integer overloading layer FStar.Integers fails to infer types properly in some cases

See issue #1691 and #1612

The type inference bug in #1612 was fixed.

A fix of #1691 is pending, but the fix strongly suggests a style of specification that favors using pre- and post-conditions rather than refined Tot functions. Perhaps this style should be applied more uniformly.

Towards fixing these problems

I am describing here my work in progress in the nik_eager_unfolding branch. (the name of the branch is incidental and poorly chosen)

As a general goal, I would like to aim towards the following:

Each additional layer of arithmetic does not introduce an indirection in the SMT solver

Following the example of FStar.Integers, we should normalize arithmetic terms before they are sent to the SMT solver, removing needless abstraction. This is possible in FStar.Integers, since it transparently multiplexes several integer types rather than introducing a layer of abstract types.

To gain the same kind of normalization for machine integers, we need a few things:

Rather than defining, say, FStar.UInt32.t as an abstract type, I define it as:

module FStar.UInt32

[@unfold_for_smt]
private
type t' =
  | Mk: v:uint_t n -> t'

let t : eqtype = t'

Note, the representation of t is visible as t', but t' itself is private, meaning that no user-written client code can refer to Mk (the constructors of t'), its discriminator Mk?, nor its projector Mk?.v. However, within F*, in both the normalizer and in the SMT encoding, the representation of t' as Mk v is visible.

Perhaps more prominently, type t' is marked unfold_for_smt ... bear with me for a moment. That's explained next.

Now, I can define:

[@unfold_for_smt]
let add (a b:t)
  : Pure t
    (requires size (v a + v b) n)
    (ensures fun _ -> True)
  = Mk (add (v a) (v b))

Compared to the existing definition of FStar.UInt32.add in master (https://github.com/FStarLang/FStar/blob/722c2e9972ecafd3b1a5993a1f58d83653b8cee1/ulib/FStar.UInt32.fst#L59), my definition above has a few salient features:

It is not abstract, so the definition is visible to the normalizer and SMT solver. We no longer need special support for machine integers in F*'s normalizer
Because its definition is visible, it has a trivial postcondition leading to smaller VCs. Note, I retain the Pure style of spec, rather than a refined Tot function to not fall in the type inference traps of FStar.Integers mentioned above.
And most importantly It is tagged with a new F* attribute unfold_for_smt. Intuitively, this new attribute is the analog of inline_for_extraction but for the SMT encoding. More precisely, a definition marked with this attribute will always be unfolded when a term is encoded to the SMT solver. But, the definition remains as is in other parts of the compiler, e.g., FStar.UInt32.add is not inlined at extraction.

That is, every occurrence of FStar.UInt32.add x y is unfolded to Mk (FStar.UInt.add (v x) (v y)). Additionally, in my branch, I mark FStar.UInt.add as unfold and so this is reduced further to Mk (Prims.op_Addition (v x) (v y)).

Finally, I also define

[@unfold_for_smt]
let v (a:t)
  : Tot (uint_t n)
  = a.v

Note, with this definition, the lemma uv_inv is no longer necessary, since the built-in encoding of datatypes in F*/SMT ensures that a.v is injective.

So, in the end, FStar.UInt32.add x y is encoded to Mk (Prims.op_Addition (Mk?.v x) (Mk?.v y)).

This is almost what we want, but this has the additional encoding wrapper of Mk and the wrapping of Prims.op_Addition on top of Z3's primitive arithmetic operators.

However, a term like FStar.UInt32.add (FStar.UInt32.add x y) z is encoded initially as Mk (Prims.opAddition (Mk?.v (Mk (Prims.op_Addition (Mk?.v x) (Mk?.v y)))) (Mk?.v z) but is reduced further to Mk (Prims.opAddition (Prims.op_Addition (Mk?.v x) (Mk?.v y)) (Mk?.v z), i.e., the redundant unbox/box is removed.

Note, we're still left with the encoding overhead of wrapping Prims.int by boxing Z3 integers ...

Making `--smtencoding.elim_box true` and `--smtencoding.l_arith_repr native` the default (and only setting for these options, i.e., removing the options)

With these options, our term

Mk (Prims.opAddition (Prims.op_Addition (Mk?.v x) (Mk?.v y)) (Mk?.v z)

is reduced further to

Mk (BoxInt (+ (+ (UnboxInt (Mk?.v x)) (UnboxInt (Mk?.v y))) (UnboxInt (Mk?.v z))

and Z3 now sees a raw arithmetic expression over three Int-sorted terms, rather than several layers of wrapping on top of the raw terms.

Current status

I have this implemented for machine integers and passed F*'s basic regression suite with it. But, there's still quite a lot to do:

Pass regressions tests in Vale, Hacl, EverParse and miTLS

I started an everest run, but hit a normalization problem when verifying Vale code. Need to investigate further.

Revise FStar.Integers and close issue #1691 (easy)

Make elim_box true and l_arith_repr native the default

Revise Lib.IntTypes to use the same style as machine integers

Some benefits:

Lib.IntTypes terms should normalize
Problematic lemmas like uintv_extensionality can be removed
Reasoning about Lib.IntTypes terms should impose no overhead in SMT relative to F* primitive integers

A subtlety:

Lib.IntTypes has, so far, relied on abstraction to enforce confidentiality of its secret int types. Under my proposal, we will still rely on F*'s module system to enforce confidentiality, but instead of using abstract types, we'd use private types. In a sense, my proposal is to make secret types abstract for implementations, but transparent for logical reasoning.

Could we then simply move the revised Lib.IntTypes to ulib and use them throughout our code, rather than only in HACL? Perhaps that could (finally) be FStar.ConstantTime.Integers.

Figure out what to do about smtencoding.nl_arith_repr

This is the setting for the representation of non-linear arithmetic terms. I don't yet know what setting to prefer. F*'s default strategy is to use a kind of double wrapping, which seems to be overkill. Vale's choice for this setting is "wrapped". That seems like a sensible default to settle on.

nikswamy commented 5 years ago

I expect @Chris-Hawblitzel @protz @karthikbhargavan @s-zanella @aseemr to be interested. This has stemmed from several conversations with them.

s-zanella commented 5 years ago

I like the gist of this proposal. My main concern is that because using the module system to enforce confidentiality is not as strict as using abstraction, there might remain loopholes that use the now transparent logical reasoning on secret integers to break confidentiality. You probably thought this through, but I'd like to hear your arguments why this is fine, and we'd need to validate it in experiments.

A minor observation about Lib.uintv_extensionality:

I removed the pattern in uintv_extensionality in hacl-star@_dev (see https://github.com/project-everest/hacl-star/blob/dev/lib/Lib.IntTypes.fsti#L169) after realizing it didn't instantiate as expected and discussing this with @nikswamy and @aseemr. See the discussion in https://github.com/FStarLang/FStar/issues/1660. I replaced uintv_extensionality withuintv_injective, which is analogous to uv_inv for machine integers and has a usable pattern.

This commit restored the pattern in uintv_extensionality, probably by mistake.

aseemr commented 5 years ago

Santiago's point about abstraction is interesting. It is possible to coerce the private types to their representation:

A.fst:

module A

private type t' = int

type t : eqtype = t'

B.fst:

module B

open A

let coerce (x:t) : y:int{y == x} = x

But I don't know about the requirements/usage of hacl secret int types, and so can't say if this is ok.

s-zanella commented 5 years ago

Some features it would be nice to have (from the F* meeting call):

Interfaces for machine integer modules. (We currently don't have interface files for machine integers modules for legacy reasons.)
Refactor definitions and properties into different modules, the way it's done for FStar.Seq.Base/FStar.Seq.Properties.

parno commented 5 years ago

This definitely seems like a positive direction for integer encodings!

One question about Tot vs. Pure: Is your recommendation that we never use refinements on Tot function arguments (or returns)? E.g., we should never write something like:

let foo (x:int {P x}) : int

Instead we should use:

let foo (x:int) : Pure int (requires P x) (ensures fun x -> True)

? Or is that recommendation somehow specific to Integers?

nikswamy commented 5 years ago

Hi @parno, sorry ... I didn't reply to your comment. My comment about style is not specific to integers. In general, when applying a dependent function to an effectful argument, the style of specification with requires/ensures, if applicable, will be smoother for type inference.

nikswamy commented 5 years ago

Also, we discussed this at an F* meeting, but it didn't get transcribed here.

In response to @aseemr and @s-zanella's questions about abstraction.

Indeed, this example does break abstraction:

A.fst:

module A

private type t' = int

type t : eqtype = t'

B.fst:

module B

open A

let coerce (x:t) : y:int{y == x} = x

However, my proposal involves adding a new inductive type for t, rather than just a type alias. i.e., something like

A.fst:

module A

[@unfold_for_smt]
private type t' = | Mk: v:int -> t'

type t : eqtype = t'

So, t and int are distinct types that can't be coerced to one another.

nikswamy commented 5 years ago

I've spend a lot of time working on this over the summer. My current state is in nik_eager_unfolding. Fragments of my work have landed in F* master, but the bulk of it has not. I'm writing to provide an update of where things stand and to explain some of the difficulties I've faced. In summary, I think I have to backtrack on some of my proposals.

Recall that one of my main proposals was to make F* machine int types (FStar.[U]Int[N]) and other abstractions on top of it (notably, HACL's Lib.IntTypes) less abstract, so as to make them more amenable to normalization and efficient SMT encoding.

The mechanism to achieve this is a new unfold_for_smt qualifier, which allows unfolding definitions for SMT encoding by default, but leaves the use of a defined symbol in place otherwise (unless unfolding is explicitly requested, e.g., by a call to the normalize_term, or norm with specific delta steps, etc.)

I started by applying this qualifier to the machine int types. For example, I have:

module FStar.UInt16
unfold let n = 16
...
[@unfold_for_smt]
private
type t' =
  | Mk: v:uint_t n -> t'

let t : eqtype = t'

While this allowed me to remove special-treatment for machine ints in the normalizer, and also to get more compact SMT encodings for machine integer terms (I got an F* green on CI with it too), I realize now that my proposal overlooked some (familiar) subtleties with extraction.

The machine int types have special status in extraction, both in the OCaml and KreMLin backends. In particular, say, FStar.UInt32.t is extracted to a different type that Prims.int; its operations, e.g., +^ is extracted to a different operator than the + on Prims.int used in its definition; etc. The trouble with revealing the model of machine integers on top of Prims.int is that during extraction, the two types can get confused. For example, here's a bit of code distilled from an example in HACL (Spec.AES.fst)

module U16 = FStar.UInt16
let test (x:U16.t) = normalize_term U16.(x -^ x)

When extracted, it shows up as

let (test : FStar_UInt16.t -> FStar_UInt16.t) =
  fun x  -> FStar_UInt16.Mk ((FStar_UInt16.v x) - (FStar_UInt16.v x))

This is clearly not the intended behavior. Even if it is, it is incompatible with the choice of representation of machine int types in our compiler. Rather than implementing U16.t natively, this generated code requires a constructed type on top of Prims.int.

One could imagine implementing various compiler optimizations that transforms code like this back to something like U16.sub x x, but such optimizations will be fragile. Indeed, I tried to add various things like this. For example, the strict_on_arguments attribute (that was added to F* master recently), was motivated in part by this problem, where I was using it to prevent unfoldings in scenarios like the one above. I also had special treatment to eliminate the constructors of the machine int types during extraction. But, none of these really solve the problem and instead compound the complexity.

My current thinking basically reinforces a design guideline that we've adopted elsewhere in the compiler, including for the existing version of machine integers in master. Here's the guideline:

Every type that has native extraction support in the compiler (e.g., Prims.int, UInt16, Int32, Buffer, ConstBuffer, ...), must be abstract

Next steps

Merge my fix to #1691: it is an orthogonal improvement and can land independently.
Abandon revising the representation of the machine integer types. These all have native extraction support and must remain abstract as they are now.
- They will continue to have special support in the normalizer.
- We can improve the specification style of some of the types, e.g., using requires/ensures instead of refinements, since these operators are used a lot with effectful arguments.
- [Optional] We could add special support for them in the SMT encoding, so that expressions like x +^ y +^ z get encoding as uint_to_t (v x + v y + v z), i.e., we could extend the elim_box optimization in the SMT encoding to also include machine int types, eliminating the indirection in the SMT encoding which is one of the main things that this proposal was trying to address.
[Optional] Improving Lib.IntTypes: In contrast to the machine integer types, Lib.IntTypes do not have any native support in the compiler. As such, they are a better candidate for the revised representation proposed here than the machine int types. (In hindsight, perhaps I should have started my work there, instead of starting with machine ints.)

That said, applying this proposal to Lib.IntTypes is also non-trivial:

To a first approximation, it would involve removing Lib.IntTypes.fsti and exposing its implementation, at least to the normalizer if not the SMT encoding.
To account for the extraction issues I neglected earlier, we may need a feature similar to Haskell's newtype (https://www.haskell.org/tutorial/moretypes.html). For instance, defining sec_int_t for Lib.IntTypes would involve defining something like:

[@unfold_for_smt]
new //<--- similar to newtype in Haskell
private
type sec_int_t' t =
| Mk of pub_int_t t

In addition to what I've proposed so far, notice that I'm proposing marking sec_int_t' as a new type. Similar to Haskell's newtype, this annotation would indicate to optimize the extraction of a single-constructor type by omitting its constructor (since, of course, sec_int_t must be represented at runtime as a public int).

Benchmarks

It's embarrassing to only request this now, but, before attempting any of the optional steps above, we really ought to collect a suite of integer verification benchmarks (both micro and macro; using various styles e.g., proofs by normalization, SMT, canonicalization, calc, etc.; using various operations, linear, non-linear, bitwise; ...)

Pinging a few people who I hope may be interested in discussing this further: @Chris-Hawblitzel , @karthikbhargavan , @protz , @parno, @R1kM , @s-zanella, @ad-l, @polubelova , @kevinmkane, ...

msprotz commented 5 years ago

Thanks for the summary Nik. Some thoughts:

do we really need newtype? if it's only an extraction issue, you should be able to rely on kremlin to remove single-branch, single-argument constructors (i.e. compile type t = | Mk of u as typedef t u;)
benchmarks: the good news is that @R1kM has ported all of fstar-master to rely on Lib.IntTypes, meaning that we now have substantial coverage: a full build of HACL* should help in catching issues, and for many of the scenarios you describe we have good files covering them: for calc, EverCrypt.Hash.Incremental; for non-linear, curve and poly1305; etc.

All in all, I'm relatively optimistic for Lib.IntTypes, provided we can get away without the new new type construct (if OCaml performance is a problem, there are OCaml attributes we can add to this type to make sure it's unboxed).

FStarLang / FStar