Recursive types - Githubissues

eholk commented 12 years ago

So we can do trees and linked lists.

calvis commented 12 years ago

Is there a well-established "here's how you implement a recursive type system" document? I'm not sure how difficult this would be, but if I had a reference with type rules I could probably translate it into miniKanren (if miniKanren will allow it).

Would an occurs-check break it?

eholk commented 12 years ago

I'm sure there is, although I don't know off hand. This might be a good start: http://en.wikipedia.org/wiki/Recursive_data_type

As far as I know, we can't do this without being able to name types. It seems like we'd need something like...

(define-datatype (list T)
  (nil)
  (cons T (list T)))

This seems like we might have to give up on our principle that the user never has to write type annotations in order to make this work.

calvis commented 12 years ago

Wait perhaps I don't understand -- it is possible to have linked lists in a type system without a recursive data type (this is c311-easy). I'm more concerned what you mean by "tree" -- do you want the user to be able to define a new data type? If so, this really deserves to be a "User defined data types" issue.

I'm perfectly happy making the user use/define a Tree API that is secretly a vector under the covers.. or we could implement records and make them structs or something.

eholk commented 12 years ago

I think you can do this in C311 because you're in a dynamically typed language.

Consider trying to make lists using vectors. We could try something like this:

(module
  (define (cons x ls)
    (vector x ls))

  (define (main)
    (cons 1 (cons 2 (cons 3 4)))))

This one doesn't typecheck because we're trying to make vectors of both ints and vectors. Even if we had tuples though, the type of cons would be something like ((int (tuple int ???)) -> (tuple int ???)). Figuring out what goes in the ??? is the hard part. We could just say ??? is (tuple int ???), but then all our lists would be infinitely long (which might be okay if we were lazy), so we need sum types to be able to tell when we get to the end.

calvis commented 12 years ago

I thought we'd just implement cons as a primitive (in the case of lists), since there are a lot of wasted cells keeping track of lengths.

But for trees, I think what you're really talking about is a polymorphic type during typechecking...

(define (leaf x)
   (vector x))

would need type (forall a (-> a (tree a))) and that is a fun (maybe impossible) thing to do in plain old miniKanren.

Incidentally, we have this problem already. We just keep quiet about it..

(define (myprint x)
   (print x))
(define (make-a-triple x)
   (vector x x x))

will fail all the typechecking.

eholk commented 12 years ago

I always figured cons would be a special case of tuple, which shouldn't need us to keep a length around. Yeah, I think we'll need polymorphism though, which seems non-trivial.

I guess this is where we start to get into questions of whether Harlan is a Scheme or an ML-like language... We could just add dynamic types and all these problems would go away. That's also non-trivial though.

calvis commented 12 years ago

This is a real design decision. What crowd do we want to pander to, the people who haven't gotten to do cool GPU programming because they use Scheme, or the people who have been writing GPU code in OpenCL who use C++ types.

Just testing my knowledge of C++ here, the functions myprint and make-a-triple would not be a problem to represent in C++, because it has templates, correct? The only problem is actually getting them to typecheck in Harlan.

eholk commented 12 years ago

Oh hmm, it hadn't occurred to me to use C++ templates to do our polymorphism for us. I was thinking we'd do monomorphizing, which is basically just "Figure out all the different types this function is called with and generate type-specific versions for each one." That's how C++ does template expansion, so it might work just to use C++ for it.

calvis commented 12 years ago

Okay, so the real problem here is rewriting our typechecker to identify where template types (do we want to call them templates internally?) must be inserted. I am fairly certain the only place this is necessary is a function type.

Unfortunately I think miniKanren cannot handle this. We could assume all functions are templated, but that seems like overkill and I'm not sure it would work anyway. Not sure how that would affect our run times.

eholk commented 12 years ago

One thing to be careful about if we use C++ to do our polymorphism for us is that I don't think OpenCL supports templates in kernels. We might be able to do some DSL magic to generate kernels from templates, but it seems like doing our own monomorphizing makes more sense.

rrnewton commented 12 years ago

MLton is an example of an ML compiler that does monomorphization (which it can as a consequence of the closed-world assumption, it being a whole-program compiler). It's even easier for staged languages in which the target-language isn't higher order.

I don't see any problem with ML/Haskell style algebraic data types for Harlan, except that if you want to avoid cycles in runtime data structures you need to make sure that recursions at the type level don't encounter mutable references. (I.e. you can make a tree with mutable leaves, but you can't make a tree with mutable internal edges.)

In Wavescript I banned such recursion to retain reference counting garbage collection. I'm not sure how it would work with your region analysis...

eholk commented 12 years ago

As far as regions go, I'm guessing we'll end up with a restriction that recursive data structures must all be in the same region. As long as we have this invariant, I don't think we have to worry about cycles because we can just free the whole region. Of course, at some point we'll probably need to collect within regions as well, which means we'll have to worry about cycles again.

webyrd commented 12 years ago

Claire, why do you think miniKanren may not be up to the inference task? Because you need universal quantification?

If so, we might be able to fake it using eigen-variables (essentially, lexically-scoped gensyms) or we could could bite the bullet and try to use alphaKanren (which admittedly would be a serious change).

calvis commented 12 years ago

Let-polymorphism can only be done using cKanren, and I think having function-polymorphism is basically the same problem. You can't pattern match a type without actually doing unification (or at least, I haven't figured out how).. which breaks the polymorphism.

I think cKanren could even give us the range of all instantiations we would need.

eholk commented 12 years ago

Getting the range of all the instantiations we need sounds like a critical step for monomorphization. This approach might work.

eholk / harlan

Recursive types #43