gazayas / masamune-ast

A covenience wrapper around Prism, a Ruby source code parser
MIT License
13 stars 1 forks source link

Add the sorting method for position and token arrays #15

Closed gazayas closed 1 year ago

gazayas commented 1 year ago
[1, 2].sum.times {|n| puts n}

This code produces the following syntax tree with Ripper:

[:program,
 [[:method_add_block,
   [:call,
    [:call, [:array, [[:@int, "1", [1, 1]], [:@int, "2", [1, 4]]]], [:@period, ".", [1, 6]], [:@ident, "sum", [1, 7]]],
    [:@period, ".", [1, 10]],
    [:@ident, "times", [1, 11]]],
   [:brace_block,
    [:block_var, [:params, [[:@ident, "n", [1, 19]]], nil, nil, nil, nil, nil, nil], false],
    [[:command, [:@ident, "puts", [1, 22]], [:args_add_block, [[:var_ref, [:@ident, "n", [1, 27]]]], false]]]]]]]

You can see that a :call node is nested within the first :call node. Because each :call node's data node is the last element within it, we grab that element and save it to the list of data nodes first. THEN we go inside the nested call recursively and get all the remaining information. So, for example...

p msmn.data[1][0][1]

[:call,
 # ↓ We search this second call node later
 [:call, [:array, [[:@int, "1", [1, 1]], [:@int, "2", [1, 4]]]], [:@period, ".", [1, 6]], [:@ident, "sum", [1, 7]]],
 [:@period, ".", [1, 10]],
 # ↓ This is the initial data node we extract since
 # we're in the first :call node we came across in the AST.
 [:@ident, "times", [1, 11]]]

Then when we get to the nested :call...

msmn2.data[1][0][1][1]

[:call,
  [:array, [[:@int, "1", [1, 1]], [:@int, "2", [1, 4]]]],
  [:@period, ".", [1, 6]],
  [:@ident, "sum", [1, 7]] # Then we grab this data node
]

This means the data nodes are reversed when getting the positions and tokens:

msmn = Masamune::AbstractSyntaxTree.new("[1, 2].sum.times {|n| puts n}")
msmn.all_methods
#=> [[[1, 11], "times"], [[1, 7], "sum"]]

This PR ensures we sort the line positions and return them in the correct order

#=> [[[1, 7], "sum"], [[1, 11], "times"]]

Why I opted not to create each data node recursively to then register them to the data node list

I want to parse the tree and assign classes to each node linearly to keep it inline with the structure of the original abstract syntax tree. That means I assign each node as is as it shows up in the AST, and I don't want to have to worry about sorting the nodes themselves according to the line number, etc.

In that way we can let the AST just be what it is, and after we've created all the node class instances, we can perform any sorting on data we extract from it to keep those parts of the logic separate.