JuliaLang / JuliaSyntax.jl

The Julia compiler frontend
Other
266 stars 32 forks source link

AST node for iteration #432

Open c42f opened 1 month ago

c42f commented 1 month ago

There's been something niggling at me about the way iteration is represented in the AST.

Currently we have the following parsing:

julia> parsestmt(SyntaxNode, """
       for x = xs
          body
       end""")
line:col│ tree                                   │ file_name
   1:1  │[for]                                   │
   1:4  │  [=]
   1:5  │    x
   1:9  │    xs
   1:11 │  [block]
   2:4  │    body

julia> parsestmt(SyntaxNode, """
       for x = xs, y = ys
          body
       end""")
line:col│ tree                                   │ file_name
   1:1  │[for]                                   │
   1:4  │  [cartesian_iterator]
   1:4  │    [=]
   1:5  │      x
   1:9  │      xs
   1:12 │    [=]
   1:13 │      y
   1:17 │      ys
   1:19 │  [block]
   2:4  │    body

But the = node here doesn't have normal assignment semantics. It does create a binding for x, but not to the expression on the right hand side of the =. Also the user may use in rather than = in the source; this is normalized to = by the parser for consistency, but this only emphasizes that there's something a bit weird going on: it's not assignment; merely assignment-like.

The use of cartesian_iterator is semantically nice because we also get to reuse it for array comprehensions where it means the same thing (the representation in Expr doesn't have this level of uniformity so things are already, hopefully, a bit of an improvement).

Complex comprehensions only make this worse, where = nodes can appear all over the place in the AST. For example,

julia> parsestmt(SyntaxNode, """
       [a for i = xs, j = ys if z]""")
line:col│ tree                                   │ file_name
   1:1  │[comprehension]                         │
   1:2  │  [generator]
   1:2  │    a
   1:7  │    [filter]
   1:7  │      [cartesian_iterator]
   1:7  │        [=]
   1:8  │          i
   1:12 │          xs
   1:15 │        [=]
   1:16 │          j
   1:20 │          ys
   1:26 │      z

julia> parsestmt(SyntaxNode, """
       [a for i = xs for j = ys if z]""")
line:col│ tree                                   │ file_name
   1:1  │[comprehension]                         │
   1:2  │  [generator]
   1:2  │    a
   1:7  │    [=]
   1:8  │      i
   1:12 │      xs
   1:18 │    [filter]
   1:18 │      [=]
   1:19 │        j
   1:23 │        ys
   1:29 │      z

Possible solution

I'd like to propose a syntax kind K"iteration" to replace the use of both K"=" and K"cartesian_iterator".

A possible rule could be:

The cases above would look like

julia> parsestmt(SyntaxNode, """
       [a for i = xs, j = ys if z]""")
line:col│ tree                                   │ file_name
   1:1  │[comprehension]                         │
   1:2  │  [generator]
   1:2  │    a
   1:7  │    [filter]
   1:7  │      [iteration]
   1:8  │        i
   1:12 │        xs
   1:16 │        j
   1:20 │        ys
   1:26 │      z

julia> parsestmt(SyntaxNode, """
       [a for i = xs for j = ys if z]""")
line:col│ tree                                   │ file_name
   1:1  │[comprehension]                         │
   1:2  │  [generator]
   1:2  │    a
   1:7  │    [iteration]
   1:8  │      i
   1:12 │      xs
   1:18 │    [filter]
   1:18 │      [iteration]
   1:19 │        j
   1:23 │        ys
   1:29 │      z

Some possible advantages

Alternatives?

Is allowing iteration to have any even number of children to represent cartesian iteration a good call? Would writing macro code against a nested AST be easier?