FStarLang / karamel

KaRaMeL is a tool for extracting low-level F* programs to readable C code
Apache License 2.0
395 stars 59 forks source link

Mangled type names for pointer to tuple with only one non-erased component #268

Closed tahina-pro closed 1 year ago

tahina-pro commented 2 years ago

What works

In general, Karamel is pretty good at erasing non-informative types away from tuple types. Consider for instance the following Low* code:

module TH
module B = LowStar.Buffer
module HST = FStar.HyperStack.ST
module U8 = FStar.UInt8

let test (t: Type) = (t & unit)
let btest (t: Type) = B.buffer (test t)
Click to expand ```fstar let f (x: test U8.t) : Tot U8.t = match x with | (v, _) -> v let g (x: btest U8.t) : HST.Stack U8.t (fun h -> B.live h x /\ B.length x >= 1) (fun _ _ _ -> True) = let r = B.index x 0ul in let nres = f r in nres let k (x: test (B.buffer U8.t)) : HST.Stack U8.t (fun h -> B.live h (fst x) /\ B.length (fst x) >= 1) (fun _ _ _ -> True) = match x with | (v, _) -> let res = B.index v 0ul in res ```

Then, Karamel erases the unit tuple components and extracts to C code with the original type name as expected (albeit producing an unused type definition in between):

Click to expand ```C uint8_t TH_f(uint8_t x) { return x; } uint8_t TH_g(uint8_t *x) { uint8_t r = x[0U]; uint8_t nres = TH_f(r); return nres; } typedef uint8_t *___uint8_t____; uint8_t TH_k(uint8_t *x) { uint8_t *v = x; return v[0U]; } ```

What does not work

But using tuple types with non-informative components both inside and outside of a buffer type at the same time will produce mangled type names, obstructing code readability. From the following Low* code:

let www (x: test (btest U8.t)) : HST.Stack U8.t
  (fun h -> B.live h (fst x) /\ B.length (fst x) >= 1) (fun _ _ _ -> True)
= match x with
  | (v, _) -> g v

Karamel produces the following C code (I merged the .h and .c files here for readability) :

typedef uint8_t K___uint8_t___;
uint8_t TH_www(K___uint8_t___ *x)
{
  K___uint8_t___ *v = x;
  return TH_g(v);
}

Similar mangling also happens in this last example when replacing the tuple type with dtuple2 or a custom record type, and replacing unit with any non-informative type (squash, Ghost.erased, etc.). Also, adding inline_for_extraction on test, btest, f and/or g does not help.

So my question is: why is Karamel producing and using this K___uint8_t___ mangled type name instead of uint8_t as in the working cases above?

msprotz commented 2 years ago

I need to investigate, but in the meanwhile, have you tried provided a custom naming hint? Try adding:

let my_favorite_name = btest U8.t
let my_preferred_name = test (btest U8.t)

and see if that helps...?

tahina-pro commented 2 years ago

Thank you for your suggestion. Alas, it does not seem to help:

let my_preferred_name1 = test (btest U8.t)

let w1 (x: my_preferred_name1) : HST.Stack U8.t
  (fun h -> B.live h (fst x) /\ B.length (fst x) >= 1) (fun _ _ _ -> True)
= match x with
  | (v, _) -> g v

let my_favorite_name = btest U8.t
let my_preferred_name2 = test my_favorite_name

let w2 (x: my_preferred_name2) : HST.Stack U8.t
  (fun h -> B.live h (fst x) /\ B.length (fst x) >= 1) (fun _ _ _ -> True)
= match x with
  | (v, _) -> g v

let g3 (x: my_favorite_name) : HST.Stack U8.t
  (fun h -> B.live h x /\ B.length x >= 1) (fun _ _ _ -> True)
= g x

let w3 (x: my_preferred_name2) : HST.Stack U8.t
  (fun h -> B.live h (fst x) /\ B.length (fst x) >= 1) (fun _ _ _ -> True)
= match x with
  | (v, _) -> g3 v

Karamel output:

typedef K___uint8_t___ *___K___uint8_t_______;

uint8_t TH_w1(K___uint8_t___ *x)
{
  K___uint8_t___ *v = x;
  return TH_g(v);
}

uint8_t TH_w2(K___uint8_t___ *x)
{
  K___uint8_t___ *v = x;
  return TH_g(v);
}

uint8_t TH_g3(uint8_t *x)
{
  return TH_g(x);
}

uint8_t TH_w3(K___uint8_t___ *x)
{
  K___uint8_t___ *v = x;
  return TH_g3(v);
}

Thank you in advance for your investigations!

tahina-pro commented 2 years ago

cc @R1kM

tahina-pro commented 2 years ago

While this issue still remains at large, I managed to work around it for most of my use cases involving Steel.ST arrays by changing the representation of universe lifting for array elements and avoiding pairs there (see FStarLang/FStar@71164331c5981644777bded77a282d3b05fe1966, just merged into master). So, I think we can downgrade this issue to lower priority for now.

msprotz commented 1 year ago

@R1kM experienced this issue today... this is because originally, a struct type along with a name is introduced for the monomorphization of the pair type for arguments uint8 and unit. We need a name there because type-checking of structs is nominal in C.

Then, the unit field elimination kicks in, and we end up with a struct with a single branch and a single field, which is in turn removed and becomes a type abbreviation (via the Eliminate case in DataTypes.ml). But we don't remember that it's an auto-generated name and could probably be eliminated in favor of the right-hand-side of the type abbreviation.

This is also complicated by the fact that other occurrences of the nominal type still lurk around, so the abbreviation is actually needed for successful type-checking. (As evidenced by your example, Tahina.)

A potential solution might be to remember globally which names are auto-generated (or use a flag on the type definition that says autogeneratedname), so that by the time it turns into a type abbreviation, we can do a cleanup that eliminates those type abbreviations that we know for sure stem from the auto-naming scheme of monomorphizations).