crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.5k stars 1.62k forks source link

The type system of AST node macro types #11981

Open HertzDevil opened 2 years ago

HertzDevil commented 2 years ago

#is_a? in the macro language is implemented as a string check:

https://github.com/crystal-lang/crystal/blob/ef05e26d6ecb0c7d1f5f0568af76ed001896a601/src/compiler/crystal/macros/interpreter.cr#L520-L525

Crystal::ASTNode#class_desc_is_a? roughly does the following: it uses some macro magic to obtain the ancestors of the built-in AST node types, extracts their #class_desc (i.e. #class_name in the macro language), then checks for membership of the const_name argument. A Splat receiver, for example, translates this to const_name.in?("Splat", "UnaryExpression", "ASTNode"). This means #is_a? is not actually backed by a type system within the macro interpreter. It also means the following fails, because no AST nodes ever have a #class_name containing spaces:

{% 1.is_a?(NumberLiteral | BoolLiteral) %} # => false

A type system is needed to assign meaning to those type expressions, so that we can reliably use the full set of macro types in #is_a? and other places that (may eventually) accept macro type expressions, such as the restrictions in built-in macro method docs, macros, a different kind of macros, and annotations. The following is an attempt to establish this type system:

The description above is mostly complete; it leaves out a definition of the intersection operator, which would be needed by macro type restrictions. #is_a? does not need it because the macro language is interpreted and does not rely on type filters. Once we construct this type system, we could refactor #is_a? into:

def visit(node : IsA)
  node.obj.accept self

  # `NumberLiteral`, `ArrayLiteral(Def)` etc.
  macro_type = lookup_macro_type(node.const)

  # `ASTNode#macro_is_a?` is the membership check for macro types
  # we avoid doing something like `@last.macro_type.implements?(macro_type)`
  # so that in case of e.g. an `ArrayLiteral` we don't have to build
  # the full union of element types
  @last = BoolLiteral.new(@last.macro_is_a?(macro_type))

  false
end

and lookup_macro_type would be readily usable when we need to support macro types outside #is_a?.

HertzDevil commented 2 years ago

Originally, these are the planned generic macro types:

class ArrayLiteral(T); end      # T = union of element types
class HashLiteral(K, V); end    # K = union of key types, V = union of value types
class TupleLiteral(T); end      # T = union of element types
class NamedTupleLiteral(V); end # V = union of value types

I have been thinking if the last two should become variadic, mirroring their normal language counterparts. This makes it possible to declare:

class HashLiteral(K, V)
  def to_a : ArrayLiteral(TupleLiteral(K, V)); end
  # without variadic type vars: `ArrayLiteral(TupleLiteral(K | V))`
  # current restriction: `ArrayLiteral(TupleLiteral)`
end

Instead of making TupleLiteral variadic, an alternative is to make Tuple a variadic subtype of TupleLiteral. Then Tuple(*T) is a subtype of TupleLiteral(Union(*T)). This approach supports tuple literals too, because they always expand to ::Tuple(...):

x = {1, 'a'}
x.class_name          # => "TupleLiteral"
x.is_a?(TupleLiteral) # => true
x.is_a?(Tuple)        # => true

x.is_a?(TupleLiteral(NumberLiteral | CharLiteral)) # => true
x.is_a?({NumberLiteral, CharLiteral})              # => true

x << 2
x.is_a?(TupleLiteral(NumberLiteral | CharLiteral)) # => true
x.is_a?({NumberLiteral, CharLiteral})              # => false

This is similar to how tuples are implemented in TypeScript:

let x: [number, boolean] = [1, true];
let y: (number | boolean)[] = x; // okay
x = y; // Type '(number | boolean)[]' is not assignable to type '[number, boolean]'.
       //   Target requires 2 element(s) but source may have fewer.

and Ruby RBS:

def foo
  [1, 'a']
end

def bar
  x = []
  x << 1
  x << 'a'
  x
end

# TypeProf generates the following:
class Object
  private
  def foo: -> [Integer, String]
  def bar: -> (Array[Integer | String])
end

The benefits are not immediately obvious, because right now macro types only have an effect within #is_a?.

HertzDevil commented 2 years ago

As for short forms other than {}, personally I see little issues expanding T? to T | NilLiteral | Nop, and _ to ASTNode (this makes sense because everything is covariant). The -> form might be related to ProcNotation but I haven't figured how they could be potentially used.

asterite commented 2 years ago

these are the planned generic macro types

I have my doubts that these types should be generic at all.

The thing is, these types in macros work in a dynamic way. I'd dare to say, macros in Crystal is the mini-Ruby inside Crystal.

So for example, if we have this:

{% array = [1, "hello"] %}

then one could say that that's either ArrayLiteral, or ArrayLiteral(IntegerLiteral, StringLiteral). But look at this:

{% begin %}
  {% array = [1, "hello"] %}
  {% puts array[0] + 1 %}
  {% puts array[1].size %}
{% end %}

It "compiles" just fine. Because actually there's no compilation or type-checking involved: array[0] will be an IntegerLiteral, and array[1] will be a StringLiteral. It's not like accessing an element will have the type IntegerLiteral | StringLiteral and then we can't resolve "size".

So... I think an array literal is not generic.

As a side note... what's the benefit of introducing all of this? I'm sure you have something in mind, but at least it's not explained here.