Unnecessary casts inserted in KreMLin AST when using --cmi

s-zanella commented 5 years ago

During KreMLin extraction when using --cmi, sometimes F* inserts unnecessary casts in the generated AST. When processed with KreMLin, these casts translate to casts to incompatible types (e.g. (void **) when the expected type is uint8_t *, or (void *) when the expected type is uint8_t) and a C compiler rightfully complains about them.

Starting from code/sha3 in hacl-star/_dev_frodo that exhibits this behaviour, I minimized the issue to these 4 files:

L.fsti
```
type inttype = | U8
```

inline_for_extraction val uint_t: inttype -> Type0

inline_for_extraction type uint8 = uint_t U8

inline_for_extraction val u8: (n:nat{n < 256}) -> uint8

- `L.fst`
```FStar
module L

let uint_t = function | U8 -> UInt8.t

let u8 n = UInt8.uint_to_t n

M.fst
```
module M
```

open FStar.HyperStack.ST open L

let f (b:uint8) : St unit = ()

let inside () : St unit = f (u8 0)

- `Test.fst`
```FStar
module Test

open FStar.HyperStack.ST
open L

let outside () : St unit = M.f (u8 0)

let main () : St Int32.t =
  M.inside();
  outside();
  0l

The expected result is that M.inside and Test.outside extract the same to C, but they don't:

static void M_inside()
{
  M_f((uint8_t)0U);
}

void outside()
{
  M_f((void *)(uint8_t)0U);
}

The symptom is that when trying to compile the extracted C, the compiler errors:

c/Test.c: In function ‘outside’:
c/Test.c:21:7: error: passing argument 1 of ‘M_f’ makes integer from pointer without a cast [-Werror=int-conversion]
   M_f((void *)(uint8_t)0U);
       ^
c/Test.c:9:13: note: expected ‘uint8_t {aka unsigned char}’ but argument is of type ‘void *’
 static void M_f(uint8_t b)
             ^~~

Passing -dast to KreMLin shows an extra cast in Test.outside that is absent in the call to M.f in M.inside: M.f (0uint8<: L_uint_t ()).

It's crucial that L.uint_t is a match. Defining it instead as let uint_t _ = UInt8.t works around the issue, but of course can't be done when there's more than one case. Another workaround is to mark M.f as inline_for_extraction, but this is not always desirable.

An unsatisfactory workaround is to compile with -Wno-incompatible-pointer-types -Wno-int-conversion -Wno-int-to-pointer-cast, but who knows what the C compiler is allowed to do when these warnings are triggered.

I'm attaching the 4 files as well as an incremental Makefile to reproduce the issue: casts.tar.gz

I'm setting the priority to fix this to high because using Lib.IntTypes in hacl-star triggers the issue and so it will show everywhere in hacl-star when using --cmi.

nikswamy commented 5 years ago

Thanks for the minimal repro, @s-zanella ... much appreciated.

Note, the problem can be seen independently of Kremlin. If you extract Test.fst to Test.ml you can see

let (outside : unit -> Prims.bool) =
  fun uu____15  ->
    M.f (Obj.magic (FStar_UInt8.uint_to_t (Prims.parse_int "0")))

The additional magic there is the problem.

The problem

The root cause of this bug has to do with memoizing extraction: See https://github.com/FStarLang/FStar/issues/1444 for more context.

Since about a2d881f3aa6ba11b6241716fcef5e669d211b510, the main control flow of the compiler is to typecheck a module, then to extract an ML interface for it (conceptually, an .mli), and to write both to a .checked file. The .mli component of a .checked file is then used to extract other .checked files either to .ml or to .krml.

This memoization of .mli helps is avoiding the quadractic behavior of re-extracting all the dependences of a module.

However, this memoization interacts poorly with --cmi. The .mli is extracted with respect to the "normal" dependences of a module: in the example above, this means that M.mli contains

val f : unit L.uint_t  ->  Prims.bool

Notice that L.uint_t was not inlined, since at .mli extraction time, the implementation module L.fst is not in scope.

However, later, when extracting Test.ml, L.fst is in scope (because of --cmi) and now, we expect M.f to have type UInt8.t -> ..., and from the perspective of the ML extraction, UInt8.t and unit L.uint_t are not compatible. And it's not possible for the ML extractor to reduce unit L.uint_t to UInt8.t, since L.uint_t is extracted simply to

type 'Auu___54_25 uint_t = Obj.t

The proposed solution

We should not extract .mli's without first inlining across module boundaries.

This means that we cannot extract .mli's immediately after typechecking a module, since the needed module implementations are not in scope.

The proposed solution is in a few steps:

Remove the memoization of .mli from .checked files.
Re-extract .mli's for the context on each run of extraction. That should fix this bug, but will slow down extraction: we should measure the slow down.
If there is a significant slowdown, then we can add a new kind of binary file, .A.fst.mli.checked (or something like that) and memoize the .mli extraction from step 1 in that file.

I'm starting on points 0 and 1 now. Comments welcome.

Thanks to @protz for some discussion about this.

Ping @aseemr

nikswamy commented 5 years ago

0 & 1 are now implemented in the nik_dep branch.

As for the performance impact of not memoizing .mli: A full sequential bootstrapping run with memoization took 9min 3sec; without memoization it took 11min 6sec.

So, there's a non-trivial cost to this suggesting that step 2 is worthwhile doing.

nikswamy commented 5 years ago

Then again, with -j 20 on my machine with has plenty of cores, the difference in bootstrapping time is:

With memoization: 1min 14sec Without memoization: 1min 24sec

I'm not sure the additional complexity of 3rd kind of auto-generated file (in addition to .hints and .checked), plus the complexity of new build rules and dependence analysis for those files is worth it.

WDYT? @protz?

aseemr commented 5 years ago

I would prefer not adding a new cache file. As you point out, it is additional complexity in the dependence analysis (which is already quite complex), mechanism and logic to detect if the cache files are stale, keep them in sync, etc.

But can we retain incrementality of extraction without them?

Suppose we write the extracted .mli part of the .checked files at the time of extraction, rather than just after typechecking. When extracting B.fst which depends on A.fst, we load A.fst.checked (as we do today), if the .mli part is present, we use it, else we extract and write to it. We will also have to store the flags such as --cmi so that this cache is used later in the same context.

One issue with this, in the current setting, is that it would invalidate C.fst.checked, when C.fst also depends on A.fst, because C.fst.checked contains the hash of A.fst.checked and not of A.fst. For that, can we make it so that we don't hash the checked files themselves, but only hash the source files?

But this also doesn't seem worth it to me. Can we also benchmark an Everest run to see if there is any noticeable performance hit there?

s-zanella commented 5 years ago

I'd also prefer not to add a new kind of binary cache file, or to repurpose .checked files to incrementally memoize extraction information. Dependency analysis and build rules are already complex as they are. A worst case 25% slowdown sounds reasonable.

Here's another data point that doesn't show any ~significant~ unbearable slowdown in parallel builds:

Building F* from scratch in nik_dep (time OTHERFLAGS="--admit_smt_queries true" ./everest -j 10 FStar make): 10m7.611s Building Frodo with nik_dep (time OTHERFLAGS="--admit_smt_queries true" make -j 10 -C frodo/code): ~2m45.263s~ 3m28.783s

Building F* from scratch in master (time OTHERFLAGS="--admit_smt_queries true" ./everest -j 10 FStar make): 10m10.751s Building Frodo using master (time OTHERFLAGS="--admit_smt_queries true" make -j 10 -C frodo/code): 2m34.794s

Edit: did another measurement and it showed a 35% slowdown in Frodo. I'm now unsure my first measurement was using the correct FStar branch.

FStarLang / FStar

Unnecessary casts inserted in KreMLin AST when using --cmi #1596

The problem

The proposed solution