gazayas / masamune-ast

A covenience wrapper around Prism, a Ruby source code parser
MIT License
13 stars 1 forks source link

Migrate to Prism #60

Closed gazayas closed 1 year ago

gazayas commented 1 year ago

Closes #52.

This is a big overhaul of the entire code base, but the concept of Masamune is still the same. This time we're returning Prism nodes instead of hashes, and a NodeHelper is included within each Prism node returned.

NodeHelper has covenience methods for navigating Prism nodes easily. For example...

  1. token: Returns the token we are looking for.
  2. token_location: Finds the location for the token we're looking for.

You can find more information in Masamune::AbstractSyntax::NodeHelper to see why these methods make it easier to navigate any source code we're trying to analyze/edit.

Thoughts

I'm finding this is a lot more easier to manage than trying to make sense out of pure Ripper output. I haven't implemented Prism::MutationCompiler like what was mentioned in #52, but I can live with the current code for now. I'm all around very happy now that we can handle Prism nodes, especially since token locations have the starting line/column AND the ending line/column (i.e. - (1, 0)-(1, 3)) as opposed to JUST the starting line and column.

Versioning

@kaspth Because we're now returning Prism nodes by default, I'm thinking the next release should be v2.0.0. Does that sound accurate? Feel free to share any other thoughts if anything else seems off!

gazayas commented 1 year ago

Hello @kddnewton, I'm still looking over @kaspth's review to make the current PR better, but if you have time I would greatly appreciate some feedback to ensure we're using Prism correctly and to make Masamune a better tool!

By the way, we're using it in Bullet Train, so if you have any questions about practical implementation of the tool, I'd be glad to share that as well. Thank you!

kaspth commented 1 year ago

Interesting, it looks like Prism::Node has a type instance method which returns a symbol: https://github.com/ruby/prism/blob/3b70a9dfceb86d66dc46ba355fe8ec4e9c1e39a6/templates/lib/prism/node.rb.erb#L212

I wish it was a class method too, then we could do some of this writing we're doing a little easier.

kddnewton commented 1 year ago

Right now with your visitor you're effectively treating every node the same and comparing them against a list of values. But the values are already split up in the visit methods. So you're going from split up to joined to split up again. It would probably be simpler to have a couple of different visitors depending on what you're trying to find. For example,

class VariablesVisitor < Prism::Visitor
  attr_reader :token_value, :results

  def initialize(token_value)
    @token_value = token_value
    @results = []
  end

  def visit_local_variable_write_node(node)
    results << node if node.name.to_s == token_value
    super
  end

  def visit_local_variable_read_node(node)
    results << node if node.name.to_s == token_value
    super
  end

  def visit_required_parameter_node(node)
    results << node if node.name.to_s == token_value
    super
  end
end

def variables(token_value: nil)
  visitor = VariablesVisitor.new(token_value)
  @prism.value.accept(visitor)
  visitor.results
end

class ParametersVisitor < Prism::Visitor
  attr_reader :results

  def initialize
    @results = []
  end

  def visit_parameters_node(node)
    results << node
    super
  end
end

def parameters
  visitor = ParametersVisitor.new
  @prism.value.accept(visitor)
  visitor.results
end

class SymbolsVisitor < Prism::Visitor
  attr_reader :token_value, :results

  def initialize(token_value)
    @token_value = token_value
    @results = results
  end

  def visit_symbol_node(node)
    results << node if token_value.nil? || token_value == value
    super
  end
end

def symbols(token_value: nil)
  visitor = SymbolsVisitor.new(token_value)
  @prism.value.accept(visitor)
  visitor.results
end

Hopefully this makes things more clear, because it's more clear each API that you're calling, and you're not treating the nodes as all one entity.

This will also be far better for performance, because each callsite will be monomorphic (meaning the places where you call node.name will always have 1 type) which is far better for caching.

gazayas commented 1 year ago

@kddnewton I'm now seeing the benefit of implementing separate visitors for each type.

I'm really happy I found these two methods for CallNodes:

  1. variable_call?
  2. method_call?

I've added a MethodCallsVisitor and VariablesVisitor (among others) and I've set things up to return the results based on these two methods. Thank you so much for the feedback/insight, I feel much more comfortable with the progress of this tool!

@kaspth There's a lot going on here already and I don't want to add too much more before finishing the PR, so if you're okay with the state of things I'll go ahead and merge this and release 2.0!

kddnewton commented 1 year ago

Yeah this is looking good. At this point I'm not sure of the value of also lexing with ripper though, since you explicitly know the location of all of the nodes once you've found them. But maybe that should be a separate PR.