JuliaLang / JuliaSyntax.jl

The Julia compiler frontend
Other
272 stars 35 forks source link

Change AST for iterations to use `iteration` kind #433

Closed c42f closed 2 months ago

c42f commented 4 months ago

I got inspired to implement #432 as suggested there... This should be a complete implementation of that proposed solution, but may still need experimentation/thought to decide whether Cartesian iteration should be nested or flat.


The = node which has traditionally been used for iteration specifications in for loops and generators doesn't have normal assignment semantics. Let's consider

for x in xs
    body
end

which has been parsed as (for (= x xs) (block body)).

Problems:

So this use of = is not assignment; merely assignment-like.

In this change, we use a new iteration kind instead of = so the for loop parses as (for (iteration x xs) (block body)) instead.

We also reuse iteration to replace the cartesian_iteration head - cartesian iteration is just an iteration with an even number of children greater than two. Being less specific here with the naming (omitting the "cartesian") seems appropriate in trying to represent the surface syntax; cartesian semantics come later in lowering and a macro may decide to do something else with the iteration spec.

These changes are also used for generators.

After the changes we have tree structures such as

julia> parsestmt(SyntaxNode, "for i in is body end")
line:col│ tree                                   │ file_name
   1:1  │[for]
   1:4  │  [iteration]
   1:5  │    i
   1:10 │    is
   1:12 │  [block]
   1:13 │    body

julia> parsestmt(SyntaxNode, "for i in is, j in js body end")
line:col│ tree                                   │ file_name
   1:1  │[for]
   1:4  │  [iteration]
   1:5  │    i
   1:10 │    is
   1:14 │    j
   1:19 │    js
   1:21 │  [block]
   1:22 │    body

julia> parsestmt(SyntaxNode, "[a for i = is, j = js if z]")
line:col│ tree                                   │ file_name
   1:1  │[comprehension]
   1:2  │  [generator]
   1:2  │    a
   1:7  │    [filter]
   1:7  │      [iteration]
   1:8  │        i
   1:12 │        is
   1:16 │        j
   1:20 │        js
   1:26 │      z

julia> parsestmt(SyntaxNode, "[a for i = is for j = js if z]")
line:col│ tree                                   │ file_name
   1:1  │[comprehension]
   1:2  │  [generator]
   1:2  │    a
   1:7  │    [iteration]
   1:8  │      i
   1:12 │      is
   1:18 │    [filter]
   1:18 │      [iteration]
   1:19 │        j
   1:23 │        js
   1:29 │      z
c42f commented 4 months ago

I managed to chat to @JeffBezanson about this and he thinks it makes sense, thanks Jeff :)

Will merge shortly I guess. Need to assess where we are at with breaking changes, the release cycle has gotten a bit blocked eek

LilithHafner commented 4 months ago

Would this be breaking to Julia or just JuliaSyntax?

JeffBezanson commented 4 months ago

I think just JuliaSyntax since there is a compatibility mapping to Expr.

c42f commented 4 months ago

Oh indeed.

As long as this package supports Julia 1.x we can never change parsing to Expr - except to fix bugs and any "minor changes" agreed as practically non-breaking. That one API has the same compat restrictions as Base independent of JuliaSyntax's version number.

c42f commented 4 months ago

Long term, I want some highly refined and improved version of JuliaSyntax.SyntaxNode to eventually be the standard syntax tree representation.

That's ultimately what changes like this are about: it's a long term project to refine our AST data model to make it more precise and easier to work with.

c42f commented 3 months ago

Writing the lowering of for, I think I can now say that this should be nested, somehow: For expression provenance purposes, I really do really want the individual clauses of a cartesian iteration to have their own nodes in the tree.

I wonder whether it'd be ok to nest these within a block. Or maybe a tuple. Neither of these are great for the purposes of semantic AST, but the surface syntax doesn't really carry strong semantics, due to it being, in some sense, an IR for macros. I vaguely feel that there are other cases where where block captures a sequence which isn't semantically a sequence of statements. Need to find these cases again and consider how they compare.

c42f commented 2 months ago

Managed to chat to @JeffBezanson again about this. We agreed that K"in" could be the best option. So the new tree would be something like (mock up!):

julia> parsestmt(SyntaxNode, "for i in is, j in js body end")
line:col│ tree                                   │ file_name
   1:1  │[for]
   1:4  │  [iteration]
             [in]
   1:5  │      i
   1:10 │      is
             [in]
   1:14 │      j
   1:19 │      js
   1:21 │  [block]
   1:22 │    body
codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 96.51%. Comparing base (835a527) to head (ddfb944). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #433 +/- ## ========================================== - Coverage 96.75% 96.51% -0.25% ========================================== Files 14 14 Lines 4259 4249 -10 ========================================== - Hits 4121 4101 -20 - Misses 138 148 +10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.