Open tomsmeding opened 5 years ago
That's correct. Actually lists are heap allocated so they could allow for members referring to the data type itself (and you can trick the compiler by redefining the type). But it really isn't supported yet. I'm not totally convinced it's a feature the language need but if/when we try to support it, it will have to be through something like Rust's Box
type.
Rust's Box
is indeed also what I was thinking of. Also not sure whether you want it in the language proper, but I was wondering whether it could be implemented in the library without touching the compiler. Perhaps I'll have a try later this week.
I think for sumtypes it definitely makes sense. I thought of it before for data types like JSON (or really any kind of AST) that you define as:
(deftype JSON
(Null [])
(Num [Double])
(Str [String])
(Arr [(Array JSON)])
(Obj [(Map String JSON)])
)
This is how most languages I know of define it (see Rust or Haskell). Without self-reference this becomes unnecessarily tedious.
I’ve needed it before for Mustache; without it the List
constructor of the type, which contains its own type definition, is not expressable, and we can’t have feature parity with other Mustache libraries.
+1! I looked into recursive data types a bit in the past. I did play with some approaches to implementing them in Carp instead of the compiler, but if I recall correctly, the first few attempts I took turned out to be unfeasible.
I just pushed a repo containing some private experiments on implementing recursion schemes in Carp. Presumably, if it's possible to implement recursion schemes using the current compiler + vanilla carp we can fake recursive datatypes. If you want to check it out to see some dead ends or get some inspo, here it is: https://github.com/scolsen/morphisms
I think this lead to nothing. Here's a good reference on other recursion schemes: http://okmij.org/ftp/Computation/fixed-point-combinators.html -- it might be possible to use one of the approaches here. I believe the straight up Y combinator implementation is impossible w/ the type system, but I think the site even covers alternatives for this situation.
Some kind of macro-level trickery might be possible too.
Some musing about this problem, as it relates to the borrow checker, ownership, and hidden allocations:
A type won’t be able to unconditionally contain itself unless we wrap it in a pointer and do a hidden allocation, which is, as far as I am concerned, a non-option. The obvious solution for a type to contain and own itself would be using something like Rust’s Box
, which also allocates, but is clear about it.
Apart from that, there are situations in which a type should be able to refer to itself. They are:
Ref
.Ptr
.Array
(because that is a hidden indirection).Map
, which will have an Array
indirection).The first three are fairly mechanical, but the fourth one is tricky, because it requires inspection of the child type, possibly recursively.
I think adding support in C should not be an issue, but getting the semantics right to make sure that all the cases that should be allowed are allowed will be somewhat hard.
@hellerve If you also want to support mutual recursion in types, then you need the fourth point anyway. The C compiler already does this (it allows pointers to self but not values of self in a struct), but it has the advantage of the language enforcing that types must be defined before they are used.
However, given that the compiler needs to know all relevant types to be able to compile a piece of code anyway, it doesn't sound very hard to me, at least in theory, to work this out recursively. There just needs to be a (small) set of built-in types that contain their argument indirectly: it sounds like that is Ref
, Ptr
and Array
(is Array
built-in?). These are then the types that "break" the recursive search for self in the check that implements this semantic validity.
We don’t currently support mutual recursion because Carp is currently a one-pass compiler, meaning that it actually goes through your code top-to-bottom trying to figure things out. So in order to make mutual recursion work we’d have to change that, which would be a medium-sized to big change; it would give us the opportunity to make other things work as well and is definitely worth it, but it is a big chunk of work.
The check itself shouldn’t really be hard to do per se, it might be a little intricate to get right, though, since with type variables a later specialization might introduce something that is not quite right. I’m not quite sure off the top of my head whether that could even ever happen, but type variables and concretization definitely make this a little more difficult to reason about (especially now that we have higher kinds, which abstract the constructors).
So, we've discussed adding something like Box
, which could get us toward the right direction.
I don't have the full context yet, and it sounds like the one-pass restriction might prevent this, but perhaps we can support this by introducing a new type system rule that more or less says the following?
T
, and a constructor for T
C
, C
may refer to T
if and only if T
is wrapped in a Ref
Such that:
(deftype (Foo a) (Bar [(Ref (Foo a))]))
is a legal type but
(deftype (Foo a) (Bar [(Foo a)]))
is not.
Which is more or less what @hellerve suggested above. But I think restricting the functionality to a single type simplifies things (we just need to make sure it's flexible enough to support all cases).
I think 🤔 ? that might work. But I'm not sure. Nods to @hellerve again, I suppose the issue becomes disambiguating between generic types...
Just thought about this, maybe you've already discussed this but I wanted to note it nonetheless: what if I write the following?
(deftype (Foo a) (Bar [(Ref (Foo (Maybe a)))]))
You then quickly land in the terrain of polymorphic recursion, which is a whole topic in itself.
@tomsmeding this is actually an infinite type, because (Foo a)
contains a (Foo (Maybe a))
which would contain a Foo (Maybe (Maybe a)))
and so on.
@scolsen this might work, but the ergonomics of having to hold onto the contained types becomes fairly questionable. Most importantly, you won’t be able to return that type from the function that owns the wrapped and wrapper type; that is more or less the problem that Box
solves, as I understand it.
I know that's kind of an infinite type, but for example, Haskell supports that just fine, and you use polymorphic recursion in functions to traverse the type.
If you're worried about the data structure itself being infinite: sorry, assume there's a second data constructor, say Bar2, that doesn't recurse. :)
Oh, interesting! I didn’t know Haskell supported that! Yeah, I feel like that’s a fairly complicated beastie then. I’m not entirely sure how we would represent that, but I think it’s one of those problems where thinking about it a lot will help :)
Thanks for that example, definitely changes the perspective a little bit for me.
I know that's kind of an infinite type, but for example, Haskell supports that just fine, and you use polymorphic recursion in functions to traverse the type.
If you're worried about the data structure itself being infinite: sorry, assume there's a second data constructor, say Bar2, that doesn't recurse. :)
Haskell only supports polymorphic recursion when types are explicitly annotated though, no? We could just piggy back off of haskell and have the same restriction—polymorphic recursion is ok, but you lose inference in such cases.
It seems you're indeed right about that; consider the following Haskell example:
import Control.Monad (join)
data Foo a = Bar (Foo (Maybe a)) | The a
deriving (Show)
joinFoo :: Foo a -> Maybe a
joinFoo (Bar foo) = join (joinFoo foo)
joinFoo (The x) = Just x
main :: IO ()
main = do
print (joinFoo (Bar (Bar (Bar (Bar (The (Just (Just (Just (Just 42))))))))))
This compiles and runs fine, but if one removes the joinFoo :: Foo a -> Maybe a
type signature, compilation fails because GHC infers two conflicting types for joinFoo: Foo a -> Maybe a
and Foo (Maybe a) -> Maybe (Maybe a)
.
EDIT: For the non-Haskellers reading this: join
has type Monad m => m (m a) -> m a
, and in the case of Maybe
it has the definition join (Just (Just x)) = x
, join (Just Nothing) = join Nothing = Nothing
.
Cool! I think for a start I'll experiment with:
T
to refer to itself when its wrapped in a Ref
.and see how that goes. I suspect I'll quickly run into the ergonomic issues @hellerve called out; I also suspect cases that are polymorphic recursive like @tomsmeding mentioned won't be unifiable.
From there we can puzzle out a better way to encode this other that Refs
(possibly just by introducing a Box
type) and determine some appropriate warning/error messaging for the polymorphic recursion cases.
This exercise will at least confirm whether or not the simple change of allowing types to refer to themselves is in fact simple--from there we can figure out the more complex backend changes needed to make it work.
One thing to note is that currently types created with deftype
can't contain Ref:s. So maybe that's a restriction that needs to be lifted first, for this to work? (my intuition is that it should be possible anyways, though)
Currently,
deftype
doesn't seem to support recursive data types:This makes sense, because in general recursive data types could make the C struct "infinitely large"; at least, without putting the recursive values on the heap explicitly.
However, since I don't think one can currently put a value on the heap explicitly in Carp (barring manual use of
malloc
/free
), and since anyway it seems this fails in the parser already, I don't think it's possible to define an s-expression type like the above in Carp.Is such a feature planned? Am I misunderstanding something?