FStarLang / FStar

A Proof-oriented Programming Language
https://fstar-lang.org
Apache License 2.0
2.67k stars 230 forks source link

Should `--dep` really emit `.krml` targets for files explicitly excluded by `--extract` ? #2385

Open tahina-pro opened 2 years ago

tahina-pro commented 2 years ago

Currently, when generating a dependency file with --dep, F produces rules for .krml files for all modules in the dependency cone of the files provided on the command line argument, even for modules explicitly excluded by --extract. In project-everest/everparse#62, @nikswamy expresses the need for F --dep to not produce rules for .krml files corresponding to such excluded modules, because some functions in such modules are using exceptions, which F* chokes on at extraction to .krml.

But then (in https://github.com/project-everest/everparse/pull/62#issuecomment-956317740), @msprotz warns against skipping such modules:

This will break a large number of applications (e.g. Noise, HACL, and soon MLS*) that rely on this behavior.

In those applications, we do --extract Spec. -Impl., meaning "extract specs to OCaml and ditch low-level code" -- this allows us to then test the specs via OCaml compilation. Low-level compilation is controlled by kremlin's bundle flags.

Note that changing this behavior might be dangerous, too, since for Low* code you may inadvertently remove a definition that would be needed by another file, resulting in Warning 2 errors and potentially invalid C code if, for instance, B.foo needs A.bar but you forgot that and passed --extract -A.

Similarly to Nik, I also need to skip producing .krml for some modules, namely Steel.Effect.*, FStar.NMST and the like containing things like effect definitions, which F* chokes on at extraction, in the context of the C model examples of #2349. More precisely, I simply exclude some files from the call to KReMLin in the Makefile: https://github.com/FStarLang/FStar/blob/49b166a96bdf2f0d4a6242c8f590ee876054acd5/examples/steel/arraystructs/Makefile#L63-L67 This is enough because, in the dependency file generated by --dep, no rule depends on building a .krml file, so omitting some of them will not break the dependency tree per se.

In practice for those Steel examples, this works because I manage to avoid the last point mentioned by Jonathan above, by maintaining a clear discipline that any Steel.* function used in those examples shall either have an extraction rule in src/extraction/FStar.Extraction.Kremlin.fs or be marked inline_for_extraction.

What should we do here?

nikswamy commented 2 years ago

Thanks for warning about this possible breakage, Jonathan @msprotz

Just to make sure we're all on the same page, the current behavior of --dep includes this:

    let all_ml_files =
        let ml_file_map = BU.smap_create 41 in
        all_fst_files
        |> List.iter (fun fst_file ->
                       let mname = lowercase_module_name fst_file in
                       if Options.should_extract mname // <--- SEE THIS GUARD
                       then BU.smap_add ml_file_map mname (output_ml_file fst_file));
        sort_output_files ml_file_map
    in
    let all_krml_files =
        let krml_file_map = BU.smap_create 41 in
        keys
        |> List.iter (fun fst_file ->
                       let mname = lowercase_module_name fst_file in
                       BU.smap_add krml_file_map mname (output_krml_file fst_file));
        sort_output_files krml_file_map
    in

So, the ALL_ML_FILES variable emitted in .depend DOES respect the module selector setting given by the --extract option. The ALL_KRML_FILES does not.

It seems to me to be a quirk of the current implementation that one of these respects --extract while the other does not.

For HACL* purposes, it seems that you do want to extract everything feeding some things to Kremlin (which will drop some of the code) and some things to OCaml for spec testing. Why not filter what gets sent to OCaml?

If we really want fstar --dep to do backend-specific filtering, then I could imagine enhancing the --extract option to take an list of pairs of codegen target * module selector.

E.g., --extract "OCaml:A,B; Kremlin:A,B,C; FSharp:A" etc. with our current version of extract interpreted as a uniform module selector for all codegen targets.

msprotz commented 2 years ago

So, the ALL_ML_FILES variable emitted in .depend DOES respect the module selector setting given by the --extract option. The ALL_KRML_FILES does not.

Correct.

For HACL* purposes, it seems that you do want to extract everything feeding some things to Kremlin (which will drop some of the code) and some things to OCaml for spec testing. Why not filter what gets sent to OCaml?

There is no productive way of doing this filtering, since the GNU make language does not offer efficient, readable ways of doing this. (I can do the filtering with $(filter-out but this is going to be unmaintainable and I'd have to do serious trickery with helper functions to make wildcards work.) There is no intermediary step between F extraction and compilation of the extracted OCaml code where "what to keep vs. what to drop" can be expressed easily. So, F's --extract option is the best way to express what needs to be built in OCaml vs. what doesn't.

Also note that extracting everything to OCaml would be an unreasonable burden on the compilation time of HACL, since some of the .krml files can take 20 minutes to extract, so replicating that work for OCaml would be a big increase. So it's important to specify from the get-go that F shouldn't even attempt to extract some of those modules.

Conversely, KReMLin acts as an intermediary between F's extraction and compilation of the final produced C code, and as such, provides a dedicated mechanism (bundling) to concisely express what needs to be kept/dropped, along with a convenient reachability analysis that eliminates the risk of over-aggressive pruning. And there is rarely "whole-module" pruning to be done at extraction-time for C, since oftentimes small spec helpers end up being used in Low code (see git grep Spec_ in dist/gcc-compatible). So, my claim for .krml is that we always want to extract the entire universe and let KReMLin's reachability analysis decide what matters, rather than making blunt cuts via F*'s --extract argument that risk cutting too much.

Tahina seems to have a different use-case, where F* has issues extracting files to .krml, but that seems to be like a bug that ought to be fixed, and is suitably worked around with a temporary fix using a filter-out.

Hope this helps!

msprotz commented 2 years ago

E.g., --extract "OCaml:A,B; Kremlin:A,B,C; FSharp:A" etc. with our current version of extract interpreted as a uniform module selector for all codegen targets.

So, in short, this is a fine proposal, but my hunch is that we would hardly ever use the Kremlin: attribute except as a way to work-around existing extraction bugs.

Maybe a shorter fix is just to catch whichever exception is thrown when extracting those problematic files and just not emit the corresponding definitions (possibly with a warning... 232, maybe? can't remember the number) at extraction-time and then there will be no need for Makefile tricks with filter-out.

nikswamy commented 2 years ago

So, my claim for .krml is that we always want to extract the entire universe and let KReMLin's reachability analysis decide what matters, rather than making blunt cuts via F*'s --extract argument that risk cutting too much.

I'm not convinced by this claim. Low was designed specifically as a subset of F which can be compiled to C. So, F* is not a capable of compiling every program to .krml.

Conversely, OCaml is a universal compilation target for F*. No matter how crazily dependently typed your program is, so long as you provide definitions for everything, we should be able to compile and execute your program in OCaml.

If your program is using an F* feature that doesn't fit in Kremlin (and there are many, e.g., exceptions, tactics, ...) then rather than sprinkling your program with noextract + [@@ noextract_to "Kremlin"], it seems less invasive to just use --extract for that.

nikswamy commented 2 years ago

From an offline discussion, @tahina-pro @msprotz and I agreed to the following:

We will change the behavior of --extract to match this:

--extract "OCaml:A,B; Kremlin:A,B,C; FSharp:A" etc. with our current version of extract interpreted as a uniform module selector for all codegen targets.

This is a breaking change and will be documented as such in CHANGES.md

For projects in the Everest repos that are known to rely on the existing behavior (HACL*, for example) we will issue PRs into those projects to switch to the new behavior, explicitly specifying different extract settings for OCaml and Kremlin.