Closed gazayas closed 1 year ago
I've just been looking through Masamune's code and one thing that sticks out to me is the result_type:
that's passed all around. One of the cool things about Prism is that it takes full advantage of Ruby's pattern matching by implementing deconstruct
and deconstruct_keys
, and if we copy something like that to our nodes, we could remove the need for the Hash
result_type
.
Basically, the Hash
result_type
suggests to me that there's a leak in the Node API, and we improve them they'd be easier to work with than Hashes. Which could simplify calling code.
Basically, the Hash result_type suggests to me that there's a leak in the Node API, and we improve them they'd be easier to work with than Hashes. Which could simplify calling code.
After using hashes for a while I definitely think it's best to return the node classes themselves, so the next two things I want to prioritize is:
location
. Also for use with search methods (i.e - msmn.variables
)That way we can add methods to the nodes down the line which will be easier to manage as you mentioned instead of just adding values to a hash.
Concerning the first issue, I'm finding that Prism doesn't actually contain the token sometimes in the results:
Prism.parse("7")
@ ProgramNode (location: (1,0)-(1,1))
├── locals: []
└── statements:
@ StatementsNode (location: (1,0)-(1,1))
└── body: (length: 1)
└── @ IntegerNode (location: (1,0)-(1,1))
└── flags: decimal
So instead of completely replacing the tree parsing we have in place now, I plan on opening a PR to use Prism nodes alongside the current code and see what will work best long-term. If perchance I'm overlooking something with Prism, then we can implement it in the PR as well.
It also looks like location
itself is changing. Whereas before it was a Range, now it looks like (1,0)-(1,1))
(I have a PR to remove the trailing parenthesis here).
There are some helpful methods in the Location
class though:
[1] pry(main)> node.location
=> (1,0)-(2,6))
[2] pry(main)> node.location.start_line
=> 1
[3] pry(main)> node.location.start_column
=> 0
[4] pry(main)> node.location.end_line
=> 2
[5] pry(main)> node.location.end_column
=> 6
Hi there 👋! Author of prism
here.
I took a look at your code here and I would definitely suggest just passing around Prism
nodes. They're going to have all of the location that you need so you won't have to implement your own syntax tree.
The tokens themselves you can always fetch using the Prism::Node#slice
(which is an delegation for Prism::Node#location#slice
). Every node and location object can always fetch itself from the source. Prism::Location
itself stores a byte offset and a byte length, but you can use the query methods you discovered in the above comment to retrieve any additional information you might want from it.
You can use pattern matching expressions with anything that you might want, but I would also suggest checking out the Prism::Pattern
class, which is designed for finding specific patterns within a tree. The other thing you might want to check out is Prism::MutationCompiler
, which is designed for doing the kind of rewriting that you're doing. (For an example of how that might work, check out Prism::DesugarCompiler
.
As for Syntax Tree, the nodes that are a part of it are going to go away in the next major release, in favor of using Prism
s AST directly. They're essentially a less effective version of the same nodes with less useful structure and less information.
If you have anything that you feel like is missing from the Prism
AST structure or the nodes themselves, I would be more than happy to add it. You can see a couple of node-specific methods here: https://github.com/ruby/prism/blob/main/lib/prism/node_ext.rb. Anything else that you might want to add to make your use-case easier just let me know.
The point of prism
was to help the community come together on a single AST design so that we could build great tooling like masamune without having to relearn the parser. So if there's anything I can do to support you, please let me know!
@kddnewton Thank you so much for the feedback!
slice
I see, I was able to retrieve the token directly from the node like you mentioned:
[1] pry(main)> parse_result = Prism.parse("7")
=> #<Prism::ParseResult:0x00007f38fbe810a8
@comments=[],
@errors=[],
@source=#<Prism::Source:0x00007f38fbec2a80 @offsets=[0], @source="7">,
@value=
@ ProgramNode (location: (1,0)-(1,1))
├── locals: []
└── statements:
@ StatementsNode (location: (1,0)-(1,1))
└── body: (length: 1)
└── @ IntegerNode (location: (1,0)-(1,1))
└── flags: decimal,
@warnings=[]>
[2] pry(main)> parse_result.value.slice
=> "7"
That will definitely make it easier to standardize on Prism as opposed to having a hybrid of Prism and pure Ripper to populate Masamune results.
Prism::MutationCompiler
I'll be sure to take a look into this and Prism::DesugarCompiler
! Rewriting source code precisely in an easy-to-read manner is a big reason as to why Masamune came to be, so I'll see what I can implement here from Prism::MutationCompiler
.
I'm definitely interested in ASTs and Ruby source code manipulation, so I'd like to continue browsing through Prism. I'll be sure to open an issue for a feature request if I'd like to see something implemented, or I might open another PR if anything! Thanks again!
Prism (previously called YARP) is the new parser for Ruby. Looking at the different methods it has to offer, I'm thinking
#parse
and#parse_lex
could potentially be useful. Here's a look at#parse
.What's different here is that the location is displayed as a range within the entire program itself. So, for example, this program has code from
0...12
, and the integer1
shows up inside0...12
at4...5
.This doesn't display line number information like Ripper does (and like what we currently do with Masamune). However, having this information can be really useful for blocks. Check this out:
This gives us the exact ending location for the
if
statement!We do a lot of namespace/resource block editing in Bullet Train, so if we can manage to implement this in Masamune with Prism, we can be a lot more confident with finding/editing blocks instead of trying to find the line numbers based off of whitespace indentation.
Perhaps we can add a new attribute to Data Nodes in Masamune that have this location besides the one we already provide (currently we provide something like [4, 7], the line number and the index on the line where the identifier starts).
syntax_tree
I think we can still glean some things from #11 as well, but getting the exact location as a range like this seems more useful right now.