diprism / perpl

The PERPL Compiler
MIT License
10 stars 5 forks source link

Type error produced by defunctionalization #59

Closed ccshan closed 2 years ago

ccshan commented 2 years ago

This bug is triggered by the following program (and not if Cons B Nil is replaced by Cons B inp):

data Unit = unit;
data String = Nil | Cons Terminal String;
data Terminal = A | B;

define gen: String = Cons (sample uniform) gen;

define inp: String = Cons A (Cons A (Cons B Nil));

define prefix: String -> String -> Unit = \xs: String. \ys: String.
  case xs of
    | Nil -> unit
    | Cons x xs -> (case ys of
      | Nil -> unit
      | Cons y ys -> if x == y then prefix xs ys else sample fail);

prefix gen inp

Running compiler.exe -c produces a program with

define _applyString_inst1_ : _FoldString_inst1_ -> String_inst1 =
  \ _this_0 : _FoldString_inst1_. case _this_0 of
    _foldString_inst1_0_ -> Cons_inst1 B Nil_inst0
  | _foldString_inst1_1_ -> Cons_inst1 A _foldString_inst1_0_
  | _foldString_inst1_2_ -> Cons_inst1 A _foldString_inst1_1_;

which has the type error Failed to unify _FoldString_inst1_ and String_inst0, in the definition _applyString_inst1_, in the expression Cons_inst1 B. It seems that the definition

data _FoldString_inst1_ =
    _foldString_inst1_0_
  | _foldString_inst1_1_
  | _foldString_inst1_2_;

is missing a case for the Nil in the original definition of inp.

davidweichiang commented 2 years ago

I think it's because String_inst0 never has its Nil get constructed. So the numbering of Nil and Cons gets out of sync.

$ ./compiler.exe -c -e ken1.ppl
data _Unit_ = _unit_;

data Unit = unit;

data Terminal = A | B;

data String_inst0 = Nil_inst0 | Cons_inst0 Terminal String_inst0;

data String_inst1 = Nil_inst1 | Cons_inst1 Terminal String_inst1;

data Bool = False | True;

define prefix_inst0 : String_inst0 -> String_inst1 -> Unit = \ xs : String_inst0. \ ys : String_inst1. case xs of Nil_inst0 -> unit | Cons_inst0 x xs0 -> (case ys of Nil_inst1 -> unit | Cons_inst1 y ys0 -> (if x == y then prefix_inst0 xs0 ys0 else sample fail : Unit));

define inp_inst0 : String_inst1 = Cons_inst1 A (Cons_inst1 A (Cons_inst1 B Nil_inst0));

define gen_inst0 : String_inst0 = Cons_inst0 (sample uniform : Terminal) gen_inst0;

prefix_inst0 gen_inst0 inp_inst0
davidweichiang commented 2 years ago

A more minimal example.

$ cat bug.ppl
data Nat = Zero | Succ Nat;
define f = \a. \b. False;

f Zero (Succ Zero)

$ ./compiler.exe -c -e bug.ppl
data _Unit_ = _unit_;

data Nat_inst0 = Zero_inst0 | Succ_inst0 Nat_inst0;

data Nat_inst1 = Zero_inst1 | Succ_inst1 Nat_inst1;

data Bool = False | True;

define f_inst0 : Nat_inst0 -> Nat_inst1 -> Bool = \ a : Nat_inst0. \ b : Nat_inst1. False;

f_inst0 Zero_inst0 (Succ_inst0 Zero_inst1)
davidweichiang commented 2 years ago

The bug is in this line at Monomorphize.hs:196:

      xis'' = fmap (\ tiss -> Map.fromList (zip (Set.toList tiss) [0..])) (semimap xis')

If a constructor never gets called, then it gets a different list of numbers from the datatype itself.

However, I'm not sure yet what the best fix is.

colin-mcd commented 2 years ago

Okay, I think I've narrowed down the issue to this: xis'' is a dict like this:

{Nat:  [tag1, tag3],
 Succ: [tag3],
 Zero: [tag1, tag3]}

Now when instantiating, Monomorphize renames all occurrences of Nat¹ to Nat_inst0 (because tag1 is at index 0 in the list) and Nat³ to Nat_inst1. Similarly for Zero. However, as Prof. Chiang noted, Succ only ever gets called as Succ³ so that becomes Succ_inst0. So with this process, it should create the datatypes:

data Nat_inst0 = Zero_inst0 | Succ_inst???;
data Nat_inst1 = Zero_inst1 | Succ_inst0;

Which is admittedly very confusing. Currently, the code assumes that the constructors and their datatypes always have the same instantiations, which is not necessarily true (i.e. when creating Nat_instX, it naively renames the constructors Zero_instX and Succ_instX). So I'm going to override the constructors' instantiations with the same ones as their datatypes, so that the constructor _instXs always match up with the datatype name _instXs. I think this should fix the bug and generate the expected datatype and constructor names.