crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.43k stars 1.62k forks source link

[RFC] Macro Defs - Methods in Macro land #8835

Open Blacksmoke16 opened 4 years ago

Blacksmoke16 commented 4 years ago

As someone who does a good amount with macros, one thing I find lacking is the ability to be DRY. There currently isn't a way (AFAIK) that allows encapsulating logic to be reused within different macros. I.e. if there is some piece of logic you need to do multiple times in a macro, you would have to duplicate it.

It would be a great addition to allow defining methods that can be used within macro code; I.e. that accept one or more ASTNodes, and return an ASTNode.

For example:

macro def foo(x : StringLiteral) : StringLiteral
  "foo#{x}"
end

macro bar
  {{ foo "x" }}
end

bar # => "foox"

This would allow common/complex logic to be defined once and reused throughout an application.

asterite commented 4 years ago

This is basically extending the macro language to allow user-defined macro methods. Right now they are hardcoded in the compiler and not extensible. Right?

My idea is that they are also macros. For example, you could reopen Crystal::StringLiteral to add methods to it:

class Crystal::StringLiteral
  macro foo(x)
    "foo#{x}"
  end
end

Then call it:

macro bar
  {{ "x".foo }}
end

It only makes sense to call the macro method foo inside macro code because that returns ASTNode, not code, though maybe we can allow it outside macro code and the code is then transformed to code by calling to_s on it.

So the way I'd like to see this implemented is by just allowing to find macros when you call them inside macro code.

That said, I'd like to avoid so many macro code in Crystal, and not having this is a great way to accomplish this.

Blacksmoke16 commented 4 years ago

This is basically extending the macro language to allow user-defined macro methods.

Not necessarily. I would rather just having methods within macro land. I.e. you pass it one or more ASTNode, and it returns an ASTNode.

Like your example would be

macro def foo(x : StringLiteral) : StringLiteral
  "foo#{x}"
end

macro bar
  {{ foo "x" }}
end

bar # => "foox"

I don't really think it's necessary to be able to reopen the ASTNodes to add your own method like you can with the stdlib types. Using what is currently defined is usually good enough. All I'm suggesting is being able to have some sort of way to keep things DRY. As in this latest example, if you had to format x three times you would have to do "foo#{x}" three times as well.

EDIT: I updated the issue desc to reflect this.

watzon commented 4 years ago

As someone that's also been doing a lot with macros, especially lately, I love this idea. I actually needed something exactly like this last night and was out of luck. I also love @asterite's idea of being able to open up macro definitions and add methods to them like you could with any other class or struct. Would this all be insanely hard to do?

Additionally having initializers for ASTNode types would be nice too. Currently the only way to define a HashLiteral, ArrayLiteral, or anything else afaik is to actually use the literal notation. That means that using a macro to generate a Hash or Array can get kinda hacky.

asterite commented 4 years ago

Well, not crazy hard, I managed to implement a quick prototype in an hour.

After thinking a bit more about this, I think it would be really nice to go forward with this.

One thing I dislike about macros is how long and hard to understand they can get. That's a good way to try to avoid them. But if we can DRY them up and make them more concise and easier to understand, and more powerful, maybe it's the best solution.

watzon commented 4 years ago

100% agree. Crystal macros are powerful, but there are definitely some instances where you have to do some pretty hacky/unreadable things to do what you need to do. Documentation is also sparse, but once macros are somewhat finalized for v1 @Blacksmoke16 and I can probably work on that.

I feel like having initializers for ASTNode types would go a long way towards cleaning things up, as well as adding some methods that are missing on their standard library counterparts. Also, annotations need some work, but that's another issue.

asterite commented 4 years ago

I feel like having initializers for ASTNode types

Do you have an example of that? I don't quite understand what this means.

Blacksmoke16 commented 4 years ago
{% hash = HashLiteral.new %}
versus
{% hash = {} of Nil => Nil %}
watzon commented 4 years ago

Exactly. Especially if we had stuff like ArrayLiteral.build and HashLiteral.new which takes a block and allows you to build it. As of right now, building a Hash using a macro is kinda hacky.

{% begin %}
{
  {% for c in SomeEnum.constants %}
    {{ c.id }} => {{c.stringify}}.camelcase
  {% end %}
}
{% end %}

With the ability to actually initialize types, especially using builder type methods, we could do something like this:

{% myhash = HashLiteral.build(SomeEnum.constants.size) do |hash, i| %}
  {% c = SomeEnum.constants[i] %}
  {% hash[c.id.stringify] = c.id.stringify %}
{% end %}
{{ myhash }}

Or something. Hopefully you get the idea. I did just realize that Hash doesn't actually have a method like this though. Maybe it should?

asterite commented 4 years ago

You can already build hashes with "regular" crystal code:

enum SomeEnum
  Red
  Green
  Blue
end

{% begin %}
  {% h = {} of Nil => Nil %}
  {% for c in SomeEnum.constants %}
    {% h[c.stringify] = c.stringify.downcase %}
  {% end %}
  p({{ h }})
{% end %}

So you see I create a hash and add elements to it at compile-time, instead of outputting a hash literal.

And with macro methods this would be simpler because you could define a method to do that, though you'd need to have each.

I don't think we'll add build and such methods, it's just too much in my opinion. We can't keep adding built-in macro methods.

Blacksmoke16 commented 3 years ago

Now that 1.0.0 is release maybe we can revisit this? :slightly_smiling_face:.

HertzDevil commented 3 years ago

Some random thoughts: if we can obtain a MacroDef AST node, e.g. from #6517, and additionally define:

class Crystal::Macros::MacroDef
  # returns the stripped result of expanding this macro, passing the given arguments
  # as AST nodes verbatim, and respecting the usual matching of arguments to parameters
  # (this macro is never an AST node method, because they have no `MacroDef` representations)
  def expand(*args, **named_args) : MacroId; end
end

Then while we cannot call macros directly, we can nest them:

macro foo(x)
  {{ "foo#{x}" }}
end

macro bar(x)
  {{ @top_level.macros.select(&.name.== "foo").first.expand(x) }}
end

bar "x" # => "foox"

macro fib(n)
  {% fib = @top_level.macros.select(&.name.== "fib").first %}
  {{ n <= 1 ? 1 : fib.expand(n - 1).stringify.to_i + fib.expand(n - 2).stringify.to_i }}
end

fib(10) # => 55

If additionally we have the global parse method, we can recover the type from such a MacroId:

macro bar(x)
  {% y = @top_level.macros.select(&.name.== "foo").first.expand(x) %}
  {% parse(y).class_desc %} # => StringLiteral
end

We can even do very wild things like anonymous macros:

{{ parse(parse("macro add(x, y); {{ x + y }}; end").expand(2, 3)) ** 2 }} # => 25

Going in another direction, we could combine macro overload lookup and expansion in one macro method:

class Crystal::Macros::TypeNode
  # looks up the macro defined in this type with the given macro name, respecting
  # normal overload rules, then expands that macro, passing the arguments verbatim
  # (this macro is never an AST node method, because the receiver being something
  # like `ArrayLiteral` is still considered a regular type name)
  def expand_macro(macro_name, *args, **named_args) : MacroId; end
end

Then write:

macro bar(x)
  {{ @top_level.expand_macro(:foo, x) }}
end

The idea is that there isn't really a need to differentiate "macro defs" from regular macros. (The term "macro def" actually means any def that mentions the macro variable @type.)

10829 also provides an alternate mechanism to pass arbitrary arguments to escaped macro expressions, but they have no formal return values.

straight-shoota commented 3 years ago

I think there's a benefit in writing macro def like the macro methods defined in the language.

I suppose one of the reasons would be that you can write them in a similar way to regular defs: Not having to wrap everything in macro delimiters, using return values, parameter type restrictions...

But also a seemless integration with the language's macro library is useful. For example, new macros can be experimented with in a shard and later promoted to the language.

Blacksmoke16 commented 3 years ago

The majority of this feature is already implemented as part of https://github.com/crystal-lang/crystal/pull/9091. Which IIRC was waiting on some additional design discussions.

Currently it uses the macro keyword, so support for macro def syntax would need to be added. I think this would also solve this concern about how the two "types" of macros could conflict with each other name wise. I'm not super familiar with how to do that, so if someone is interested in getting a POC going and wants to take it over, that's fine by me.

Also going to reiterate what I said in https://github.com/crystal-lang/crystal/issues/8835#issuecomment-589977796, I'd be totally fine with not being able to monkey patch the built in AST node types and only defining them on the top level or within a normal user defined namespace if this were to make the implementation easier/better.

Blacksmoke16 commented 2 years ago

Wanting to bring this RFC up again, as it seems to be something I constantly want/need. Would be open to sponsoring someone to implement this as well :pray:.

To summarize I think the primary goal of this feature is to:

Reduce the amount of duplication required in macro code by allowing reusable macro methods

For example:

macro def foo(x)
  "foo#{x}"
end

macro bar
  {{ foo "x" }}
end

bar # => "foox"

This plus being able to call the macro def recursively would be a HUGE win to how macro code could be written, ultimately making it easier to read/maintain/share.

Going further and allowing users to monkey patch methods into the stdlib's macro types could be useful, but I don't think it's a requirement, at least for a first pass, given you could just pass the node as an argument. Similarly, supporting parameter/return type restrictions could be a nice win from a documentation/readability POV, but also don't think it should be a blocker to a first pass given normal macros don't support them on parameters either.

Based on what I read in this issue and the related PRs, I think what we'd need to do is like:

Of course not all of this has to be done at once, getting a solid MVP implementation going first would be fine as it would allow people to start using it, then figure out what the pain points are to improve on.

HertzDevil commented 2 years ago

Minor: the term "macro def" already refers to defs that use @type.

straight-shoota commented 2 years ago

Yeah, we'll have to take care of using clear terms. We have def with macro code, macro, top-level macro expressions, and with this feature there's another kind of macro. What should it be called? I'm not sure. macro def as keyword would probably be confusing. From a technical point of view, the macro keyword could actually work for this. The difference to normal macros is determined at call site (is it called from normal code or from a macro expression?). But of course it would be helpful to have a clear distinction at the definition.

Similarly, we need to worry about ambiguity at call site, as well. We already have macro calls. How do we call calls in macro expressions?

HertzDevil commented 2 years ago

One way to address the ambiguity is:

macro def foo(x)
  "foo#{x}"
end

macro bar
  {{ {{ foo "x" }} }}
end

bar # => "foox"

foo("x")             # Error: undefined def or macro "foo"
{{ foo("x") }}       # Error: undefined macro "foo"
{{ {{ foo("x") }} }} # => "foox"

That is, a nested macro interpolation expects a single Call within the {{ ... }}; its argument expressions are evaluated first, and then the call would evaluate the given macro def. This applies to macro defs themselves too because they are also {% ... %} contexts:

macro def fact(x)
  if x <= 1
    1
  else
    x * {{ fact(x - 1) }}         # calls `fact` recursively
    # {{ x * fact(x - 1) }}       # Error: receiver of macro def call must be a type (see below)
    # x * {{ {{ fact(x - 1) }} }} # Error: expected Call, got MacroExpression
  end
end

{{ {{ fact(4) }} }} # => 24

Argument evaluation means x - 1 is truly computed, not passed as a Call with a Var receiver. Technically one could nest further {{ ... }} calls in these arguments.

macro def lookup is performed in the same way as regular macro lookup, using the top-level namespace instead of the AST node namespace. Same goes for the receiver:

class Foo
end

class Bar < Foo
  # `Bar.foo` -> `Foo.foo` -> `::foo`
  {{ {{ foo }} }}

  # `Bar::Bar.foo` -> `Bar.foo` -> `::Bar.foo`
  {{ {{ Bar.foo }} }}

  # `::foo`
  {{ {{ ::foo }} }}
end

With this I don't think we need to support @type and @top_level as receivers in these macro def calls.

This creates a clean separation between macros and macro defs; TypeNode#foo and @type.foo cannot be defined or monkey-patched through macro defs under all circumstances. The opposite is also true; macro defs are never "promoted" from existing macro methods, so there will be zero macro defs in the standard library until we write one. I see this separation as a benefit more than a disadvantage.

j8r commented 2 years ago

Or call them macro functions - macro fun? Technically they are more functions, since they are not methods called from an object instance. fun already exist for bindings. In a way the macro-land is already a bit special as there are for loops.

Blacksmoke16 commented 2 years ago

What should it be called?

Kind of just spitballing here so TBD on exact names of things, but some ideas I thought of:

Use an annotation to denote a macro as "callable". Would require being able to read annotations on macros, and allow displaying annotations applied to them in some way. Would probably require less? compiler work since you could reuse existing macro parsing logic, and just set a flag on it, like visibility?

@[Callable]
macro foo(x)
  "foo#{x}"
end

Make the keyword more akin to a modifier of the macro, e.g. private or protected. Probably more readable and could also just be a flag.

callable macro foo(x)
  "foo#{x}"
end

Use some symbol as part of the macro name to denote it being special. Similar could be used to just set a flag, tho I'd say less readable and easier to miss.

macro $foo(x)
  "foo#{x}"
end

Require type restrictions on them and use that as a means to differentiate:

macro foo(x : StringLiteral) : StringLiteral
  "foo#{x}"
end

Or lastly, use some entirely new non-ambiguous name, like ast (fun|def) foo(x) or node (fun|def) foo(x) or something like that. Coming up with a clearer work than macro might be challenging.

@HertzDevil I'm assuming that logic would be specific to Call? I.e. if you tried to do {{ {{ 2 + 2 }} }}, you'd still get Error: can't nest macro expressions?

asterite commented 2 years ago

What's wrong with macro def?

Blacksmoke16 commented 2 years ago

@asterite The concern is mainly around https://github.com/crystal-lang/crystal/issues/8835#issuecomment-1080024082. I.e. that we already have macro def meaning something, and making it mean something else would be confusing.

asterite commented 2 years ago

Oh, I see. We call those macro def because macro def was a thing in the past. Eventually we made this @type rule implicit, and now you always use def. So my 2 cents:

straight-shoota commented 2 years ago

Let's settle the terminology in #11945

HertzDevil commented 2 years ago

@HertzDevil I'm assuming that logic would be specific to Call? I.e. if you tried to do {{ {{ 2 + 2 }} }}, you'd still get Error: can't nest macro expressions?

It is as shown above:

Blacksmoke16 commented 2 years ago

So sounds like https://github.com/crystal-lang/crystal/issues/8835#issuecomment-1080417733 offers a pretty robust implementation that takes care of the ambiquity problem. Do we think this is the way forward? Would be nice if we could avoid the extra {{ }}, but not a huge deal I'd say given its benefits.

HertzDevil commented 2 years ago

Also minor point: the following is valid syntax:

module A
  macro def
    1
  end
end

A.def # => 1

Attempting to parse a macro def would then fail. If we really want to preserve this then one way is to use a single-word keyword like macrodef or macro_def.

asterite commented 2 years ago

I don't understand what's the ambiguity.

HertzDevil commented 2 years ago

Here are some additional examples involving non-printing or escaped macros:

macro def foo
  {% x = 1 %} # Error: macro def call must use {{ ... }}, not {% ... %}
end

macro foo
  {{ {% foo %} }}     # Error: macro def call must use {{ ... }}, not {% ... %}
  {% x = {{ foo }} %} # okay
end

{% ... %} cannot be nested within {{ ... }}. The opposite is allowed.

Also notice how the macro and the macro def are both called foo. This is technically fine, because the nested macro treatment implies they are always in separate namespaces, but we may want to forbid this to avoid confusion. This is notably not forbidden between defs and macros already; macros are "ordered before" defs in terms of "overload" ordering if they share the same name.

macro foo
  \{{ {{ 1 + 2 }} }}
  {% debug %}
end

foo # => {{ 3 }}

If an outer {{ ... }} is escaped, everything within is interpolated normally. This is existing behavior and will continue to be allowed. The body of a macro def is never considered to be escaped in this manner.

macro def foo
  \{{ 1 + 2 }} # Error: unknown token: '{'
end

macro bar
  {% x = \{{ 1 + 2 }} %} # Error: unknown token: '{'
end

bar

Escaped macros cannot be nested in macros, so it follows bare escaped ones are also disallowed within macro defs. At the moment the supported AST expressions within a macro def should be exactly what are currently permitted within {% ... %} (no case for example).


I don't understand what's the ambiguity.

This one, i.e. whether macro defs and AST node methods should both be callable with the same syntax.

asterite commented 2 years ago

I just tried the code snippet mentioned in my comment. It doesn't work.

I really don't think there's any ambiguity. It seems that thread mentions that you can do @type.foo and that would call a macro foo in that type, but that's not the case. Maybe it was just something that was desired but it wasn't working like that.

In the PR I sent implementing this there was no ambiguity. Here's how I think it should work:

For example:

class Foo
  macro foo
    1
  end

  macro def foo
    1
  end
end

Foo.foo # Calls the first macro, expands to 1

{% Foo.foo %} # Calls the second macro, **returns** a NumberLiteral with the value 1

What's the ambiguity with this?

HertzDevil commented 2 years ago

It still clashes with the names of AST node methods:

struct Int32
  macro def ancestors
  end
end

{% Int32.ancestors %} # does this return a `Nop`?

The opposite argument is we can then redefine all AST node methods as "primitive" macro defs, but that's a much larger undertaking and I'm not sure if a breaking change is avoidable there. The nested macro approach I presented ignores even AST node methods.

lbguilherme commented 2 years ago

We can either accept that these methods can be overridden (just like regular methods from stdlib can), or perhaps introduce some less aggressive way to avoid the ambiguity: adding a call macro:

struct Int32
  macro def foo(a, b)
    a + b
  end
end

{% Int32.call(:foo, 1, 2) %} # returns a `3` literal

This is easier to read and integrates better with existing syntax, IMHO. It would be defined for TypeNodes and at the top level.

asterite commented 2 years ago

It still clashes with the names of AST node methods

Right, and in that case you are overriding the method. There's still no ambiguity.

The opposite argument is we can then redefine all AST node methods as "primitive" macro defs, but that's a much larger undertaking and I'm not sure if a breaking change is avoidable there

I don't think there's a breaking change at all.

All the current macro methods are defined with macro def and they are primitives. Of course that syntax doesn't exist yet, so you can't do it, but we can imagine that's how it's done.

Now we are enabling macro def to be actually defined by users, and the existing ones can also be defined, it's just that they are @[Primitive] somehow (and eventually we could implement them like that)

So with this:

struct Int32
  macro def ancestors
  end
end

you are overriding the behavior of Int32.ancestors inside macros, but String.ancestors would still return the "correct" value.

Then we can redefine ancestors for every type like this:

class Crystal::Macros::TypeNode
  macro def ancestors
  end
end

It's a bit similar to how we can define a method for all classes by defining an instance method in Class:

class Class
  def foo
    1
  end
end

p Int32.foo # It works!

So there's a bit of "conflict" in that you can define a macro def on a type, or on TypeNode, and both kind of work, and it's exactly the same as in regular Crystal code, where you can define a class method on a type, or a method on Class, and they both kind of work.

Then you can define macro methods on Crystal::Macros::*, so for example adding a method to a StringLiteral. And all existing methods are also macro def, they are just primtiives.

Is there any conflict or ambiguity with that approach?

HertzDevil commented 2 years ago

It seems the idea here is to establish a "metaclass" hierarchy on top of TypeNode that mirrors the real hierarchy of metaclasses. I am still not sure if that's the best approach, but it does admit an alternative syntax:

macro class Crystal::Macros::TypeNode
  @[Primitive]
  def ancestors
  end
end

macro class Int32
  # the entire body deals with the `Int32` macro "metaclass", so
  # every def is implicitly a `macro def`

  def ancestors
  end
end

# not allowed for macros, but allowed for the non-macro class analog (#11764)
macro def String.ancestors
end

# is this doable? (if not then the `macro def` syntax is still
# necessary for top-level definitions)
macro class <Program>
end

And we can simply call them "defs" of "macro metaclasses", avoiding the terminology conflict altogether.

asterite commented 2 years ago

Oh, I like that! I like that these things are "scoped" inside a "macro" context.

But how would you define a top-level macro method with that syntax? Is it just macro def? I'm thinking I'd still like methods inside macro class to be called macro def just because if you see def ... without the surrounding context you can't immediately know if that's a method to be used at compile-time or not.

So maybe we only need macro def in that case, and macro class isn't really needed, because the type Int32 is the same inside or outside macros.

Blacksmoke16 commented 2 years ago

Would macro def also be used to define them within a non stdlib namespace as well? Or is the idea you'd also be able to do like macro class MyModule? But yea, if we're okay with just allowing the end user to override macro methods like they could with stdlib methods, then that syntax would definitely be more readable/flexible I'd say. I'm all for having an implementation that is a bit future facing to make the macro language implementation simpler longer term, which sounds like this approach would.

HertzDevil commented 2 years ago

A slightly different idea: we reserve macro class to represent the AST node types in the macro language, without the Crystal::Macros part. After all, the types we use in the macro #is_a? are in an entirely separate namespace, do not physically reside under Crystal::Macros, and never conflict with non-macro types sharing the same names. Then normal types use macro def within the usual class. Example:

macro class ArrayLiteral
  @[Primitive(...)]
  macro def each_with_index(&); end

  @[Primitive(...)]
  macro def size; end

  macro def empty?
    # note: implicit `self`
    size == 0
  end

  macro def splat(trailing_string = nil)
    str = ""
    each_with_index do |v, i|
      str = "#{str.id}, " if i > 0
      str = "#{str.id}#{v}"
    end
    if trailing_string && !empty?
      str = "#{str.id}#{trailing_string.id}"
    end
    str.id
  end
end

# okay, not the AST node macro type
class ArrayLiteral
  macro def empty?
    # same as `ArrayLiteral.ancestors.empty?` in a macro context
    ancestors.empty? # => false
  end
end

# okay, not the AST node macro type either
class Crystal::Macros::ArrayLiteral
end
asterite commented 2 years ago

I like that! In the API docs I can imagine they will also live inside a different section.

lbguilherme commented 2 years ago

Inside a macro class you define macro defs that happen to be like instance methods on the AST node. At the same time macro defs at a regular class/module operate like a metaclass method. This is odd because the same syntax create methods that are called in different ways. I propose we use regular method syntax inside a macro class to define methods of AST nodes (as primitives), and macro def syntax at regular classes/modules to define methods that are callable from the TypeNode.

macro class ArrayLiteral
  def foo
    "foo"
  end
end

class String
  macro def bar
    ancestors.foo
  end
end

{% p String.bar %} # prints "foo"
Blacksmoke16 commented 2 years ago

Anyone interested in getting a PoC together? Would love to see this gain some momentum.

EDIT: Reminder I'd be possibly willing to sponsor someone for their efforts.

HertzDevil commented 2 years ago

The other day I realized the current macro defs are named like that because methods referring to {{ @type }} used to have the macro def syntax: #2565

Blacksmoke16 commented 2 years ago

@HertzDevil Yea, that's why I was thinking we could reuse that syntax since nothing is using it anymore. Tho, as you and others pointed out, there are other options depending on exact path we want to take this. I.e. The more focused approach of "just allow macro code to be reused by defining reusable macro methods" or treat this feature as a way to implement the existing macro methods/modify the methods available on the macro AST nodes.