crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.45k stars 1.62k forks source link

On Crystal Generics #268

Closed papamarkou closed 9 years ago

papamarkou commented 9 years ago

While working with struct/class generics, a few language questions on generics arose and I would like to hear what are your thoughts on them. To follow through, I will use a naive example.

1 ) The first question is this; assume you want to define an elementary Dimensions "parameter", which will be used as a generic parameter or as part of class hierarchy. I have 2 ways in mind to get this done:

Way 1.A (using subclasses):

abstract struct Dimensions
end

abstract struct TwoDimensional < Dimensions
end

abstract struct ThreeDimensional < Dimensions
end

Way 1.B (using type unions):

abstract struct TwoDimensional
end

abstract struct ThreeDimensional
end

alias Dimensions = TwoDimensional | ThreeDimensional

Is one of 1.A, 1.B considered better than the other in some respect?

2) Let's say now that we want to define a Shape struct parametrized by the shape's dimension. Is one of the following 2 approaches preferred over the other?

Way 2.A (using subclasses, Dimensions not used):

abstract struct Shape
end

abstract struct TwoDimShape < Shape
end

abstract struct ThreeDimShape < Shape
end

struct Circle < TwoDimShape
end

Way 2.B (using generics, Dimensions is the parameter-generic):

abstract struct Shape(D)
end

struct Circle < Shape(TwoDimensional)
end

3) In the case of 2.B, the following happens with generics:

shape = Circle.new
shape.is_a?(Shape) # returns true
shape.is_a?(Shape(TwoDimensional)) # returns false

Why shape.is_a?(Shape(TwoDimensional)) returns false instead of true?

4) Would it be useful to make it possible to restrict the type of a generic? I have in mind sth along the lines

abstract struct Shape(D : Dimensions)
end

Sorry for the long discussion-issue, I hope that the above splitting into numbered sections allows the questions to be clearer.

vendethiel commented 9 years ago

1/2/3) I like ADTs, so I'd vote for ADT. (Scala, for example, uses "sealed classes" for that (meaning you can't extend it in another class, so the match exhaustiveness can be checked))

4) yes!

insidewhy commented 9 years ago

It'd be lovely if crystal supported ADTs. I'm a fan of scala but that compiler is so slow.

asterite commented 9 years ago

@scidom it gave you false in your example because it's a bug. Good catch :-)

I don't fully understand... what would those Dimension type provide? I almost always relate generic types to containers. And, in general, I try to limit the use of generics because it makes code less readable (because of those T annotations). Why would a Circle be a generic type?

In short, I think that without a more concrete example I can't help you decide which alternative is better.

Just so that it is said somewhere (it's not documented), the representation of a union of types varies depending on the union. For example a union of Reference types is represented as a pointer because each instance carries a type_id field as the first member. A union of Reference types and Nil is still represented as a pointer, the null pointer is used to say "this is Nil". But a union of something that's not Reference (a number, bool, a struct) and something else is represented kind of like a C union: a tag for the type id and then a series of bytes as long as the maximum space needed according to those types. Knowing this you might be able to choose better.

@vendethiel @nuisanceofcats how would you implement ADTs in Crystal? (possible with a new syntax or features).

papamarkou commented 9 years ago

Thanks a lot @asterite, both for fixing the bug and for the feedback.

In the example above, I was not thinking of Circle as a generic type. What I had in mind was to classify Shape objects to 2-dimensional objects (such as Circle or Square) and 3-dimensional objects (such as Sphere). The thought was to introduce the abstract Dimensions struct and the TwoDimensional, ThreeDimensional abstract sub-structs as some sort of "enumeration". Dimensions would be the T parameter in your notation. Then we would have two-dimensional shapes (Shape(TwoDimensional)), of which Circle would be a sub-struct. So all the generics do here is simply to organize substructures, they are not used as parameters in specific shapes, such as Circle. Sorry for the confusion of the initial post, I hope this clarification sheds some light on what I was thinking :)

asterite commented 9 years ago

So these Dimensional, TwoDimensional and ThreeDimensional would just help categorize these types? They won't define any method or functionality?

I think the best way to do that is with modules:

module Dimensional
end

module TwoDimensional
  include Dimensional
end

class Circle
  include TwoDimensional
end

puts Circle.new.is_a?(TwoDimensional)

That way you don't force a superclass for a shape, plus you avoid having generic types.

papamarkou commented 9 years ago

Good point, I haven't considered this approach. Not sure if there will be any super classes yet, there may be fallback methods (haven't fully formulated the setting, thinking slowly about statistical distributions, univariate, multivariate etc). Is there any performance penalty when using generics or superstructs?

asterite commented 9 years ago

No performance penalties in either case. We don't use the classic "virtual table" approach, so if the compiler knows that a variable is of type X, even if the method is in a superclass, there's no "virtual table lookup", the method is known at compile time. This might lead to duplicated methods begin eventually generated, or very similar methods. But:

  1. This only affects the final executable code size, which is not as important as the program's speed.
  2. A method might be inlined, so all of those duplicated definitions might not even exist when compiled in release mode.
  3. There's an LLVM pass that merges similar functions.

Small performance penalties appear when you have union types, where you have to make a multi-dispatch. But in the case of a dispatch over a class hierarchy (say, Shape is an abstract class and a variable is one of its subtypes, but you don't know which at compile time), because of the way we give an id to types in some cases LLVM optimizes the multi-dispatch into a jump table :-)

papamarkou commented 9 years ago

Great, this is all very useful info for developing code!

asterite commented 9 years ago

I'll close this issue. I think all the questions have been answered. If you want to add ADTs to crystal please send a feature request (with a detailed proposal), but I think module inclusion / subclasses work well right now.