diprism / perpl

The PERPL Compiler
MIT License
10 stars 5 forks source link

Datatype recurses with different parameter #70

Closed davidweichiang closed 2 years ago

davidweichiang commented 2 years ago

From Solve.hs:

Make sure type params are same in recursive instances. So disallow

data List a = Nil | Cons a (List Bool);
davidweichiang commented 2 years ago

But these cases seem tricky to tease apart:

data Thing1 a b = C1 Thing2 b a
data Thing2 a b = C2 Thing2 b a
data Thing1 a b = C1 Thing2 a b
data Thing2 a b = C2 Thing2 b a
ccshan commented 2 years ago

Do you mean these cases seem tricky to tease apart:

data Thing1 a b = C1 (Thing2 b a)
data Thing2 a b = C2 (Thing1 b a)
data Thing1 a b = C1 (Thing2 a b)
data Thing2 a b = C2 (Thing1 b a)

As you wrote it, data Thing2 a b = C2 (Thing2 b a) [I added the parens] alone violates "type params are same".

What's the purpose of this restriction? Is it just to make sure that monomorphization does not need to yield an infinite number of types? Or that monomorphization only needs to yield types that in some sense literally appear in the program? I understand one type definition we really want to rule out is data Thing a = C1 | C2 (Thing (a,a)). The introduction of https://doi.org/10.1007/BFb0054285 seems relevant.

davidweichiang commented 2 years ago

Yes, that's what I meant, thanks. But I see now that that even when fixed, both ultimately lead to recursion.

I am not totally sure what the reason for the restriction is. It was in @HerbertMcSnout's comments and I thought it would be good to work out.

How is the (a,a) restriction you mentioned formulated? A type variable can only appear once in a branch?

In any case, when this is settled, the algorithms for checking for type recursion should be double-checked. (There are still two of them, one in Free.hs and one in DR.hs.)

colin-mcd commented 2 years ago

Yeah the purpose is to avoid monomorphization introducing an infinite number of types. Theoretically it's possible to allow

data Thing1 a b = C1 (Thing2 a b)
data Thing2 a b = C2 (Thing1 b a)

But I think I just required that in any of Thing1's constructors, any references to Thing1 (even nested within Thing2) must have the same type parameters because figuring out when exactly the types stay finite seemed pretty tricky.

davidweichiang commented 2 years ago

Not sure if this should be a separate issue, but this program hangs the compiler:

data Thing a = C1 | C2 (Thing (a*a));
C1
davidweichiang commented 2 years ago

I thought it might be blocked by the same logic that blocks polymorphic recursion, but it seems not.

davidweichiang commented 2 years ago

@ccshan wrote in the discussion of #69:

As for monomorphization, I should read up on it but I think the basic idea is to keep creating new instances of types and global definitions until the types required by those instances in turn are all already dealt with? And the main concern with recursion there is that it might cause monomorphization to not terminate? And a simple restriction that ensures that would be that whenever creating an instance (of a type or global definition) necessitates another instance (of the same type or global definition), the two instances must be the same?

That all sounds correct to me. The "two instances must be the same" constraint seems rather tight to me, but I also can't think of a really good use for the cases that break this constraint.

Amazingly, Haskell allows this:

data WeirdList a = Nil | Cons a (WeirdList (a, a))
x = Cons 1 (Cons (1,2) (Cons ((1,2),(3,4)) Nil))
ccshan commented 2 years ago

Amazingly, Haskell allows this:

data WeirdList a = Nil | Cons a (WeirdList (a, a))
x = Cons 1 (Cons (1,2) (Cons ((1,2),(3,4)) Nil))

Side note: perhaps more amazingly, it's "useful": If you change Cons a to Cons, it represents binary trees that are perfect. And then From fast exponentiation to square matrices: an adventure in types

davidweichiang commented 2 years ago

OK, but we're still not going to allow it, right?

Is there any reason not to implement the "two instances must be the same" constraint?

ccshan commented 2 years ago

Is there any reason not to implement the "two instances must be the same" constraint?

Given that we need to monomorphize anyway, I think it's ok to be this restrictive about recursive types: If the programmer wants to use recursive types in some disallowed way, then they can always monomorphize manually.

davidweichiang commented 2 years ago

Copying these examples here for eventual tests:

-- triggers nontermination in `robust`
data T a = MkT (T (T a));
define f : T Bool = fail;
f
-- triggers nontermination in `isInfiniteType`
data T a = MkT (T (T a));
extern dist : T Bool;
()