Open c42f opened 1 year ago
have a
@match
API
Available packages
Related issues
Related Discussions
Perhaps we could develop a new pattern matching package/module on top of the information provided by JuliaSyntax.jl. This might be a good start to adding pattern matching syntax to julia.
It's also roughly the approach that Ben is taking in SemanticAST.jl
And SemanticAST.jl use MLStyle.jl
MLStyle is very comprehensive and I know SemanticAST is using that. MacroTools is great for simple matching, but I personally never liked having the dependency and tend to write the "big pile of if
statements" by hand...
Personally I'd be satisfied with a really good special purpose tool for matching expressions. But if that eventually grew into something more general that's cool too.
The single biggest issue I've had with MLStyle is that the documentation on how to customize it isn't great. It's there, just not particularly apparent. I still don't entirely get how decons
works. If there was some pre-rolled SyntaxNode matcher that would address the problem. Other than that the experience has been pretty good.
I'm not sure what a more generic pattern matching system would look like, particularly if it's agnostic to order. Systems that let the user write a surface level syntax pattern and then compile it to a matcher are cool but beget issues understanding when they may or may not match some expression. As an example, I recently ran into a big annoyance with
foo(::T)::T where T<:Int
which parses as (:: (call foo (:: T)) (where T (<: T nothing Int))
(IIRC). This then means that if you match this against the naive term name_(args__)::retty_
the return type you'll get back is wrong because it includes the where
when it should really just be the type variable.
As a result of this without something more like SemanticAST that tries to produce a single AST node for each concept in the language I'm not sure if there's a great way to abstract over what the precise Expr tree looks like. The user needs to spend a frustrating amount of time reasoning about "what is the head that shows up here" exactly with Exprs and directly abstracting over that in a pattern matcher is I suspect not possible due to contextually-dependent definitions of what a head might mean. One alternative might be to supplant the Expr/SyntaxNode structure entirely with GreenNodes, but I'm not sure if that actually makes the problem easier?
This was the problem that ended up scuppering my plans for a more Expr-targeted AST matcher that had automatic completeness checking: I couldn't define a coherent grammar for Expr/SemanticAST because it's natively all over the place and context-dependent.
Maybe foo(::T)::T where T<:Int
was problematic because the precedence of ::
and where
is inconsistent in function definitions vs outside them?
julia> JuliaSyntax.parse(JuliaSyntax.SyntaxNode, "function f()::T where T end")
line:col│ tree │ file_name
1:1 │[function]
1:10 │ [where]
1:10 │ [::-i]
1:10 │ [call]
1:10 │ f
1:15 │ T
1:23 │ T
1:24 │ [block]
julia> JuliaSyntax.parse(JuliaSyntax.SyntaxNode, "f()::T where T")
line:col│ tree │ file_name
1:1 │[::-i]
1:1 │ [call]
1:1 │ f
1:6 │ [where]
1:6 │ T
1:14 │ T
I assumed it was intended because it has a special case in the reference parser, but it's an oddity for sure.
That's absolutely the reason for that specific case; my experience is that with function names in particular there's just an endless list of weird oddities you need to find out about through frustration that don't always have obvious mappings into Expr
-land.
In my opinion, what macro developers (who I presume are the target audience for an externally-facing tool) generally want is either:
Right now Julia's Exprs are sort of somewhere in between those two. They are fairly closely aligned with the source syntax and require that the input be... roughly... valid Julia, but they also have all sorts of weird and wonderful forms for the same semantic thing making it annoying to reason about "my input should be valid Julia."
In my opinion, what macro developers (who I presume are the target audience for an externally-facing tool) generally want is either [...]
Good summary! I'd been assuming that the "Julia semantic matching" option would be something like giving macros an API for the syntax desugaring part of lowering. Which is what SemanticAST does I guess :-)
I think there's two audiences:
Expr
compat layer.Right now Julia's Exprs are sort of somewhere in between those two
Yes. I've been trying to push them a little toward more alignment with the source syntax (see #88). But I assume they'll always require that the input be roughly valid Julia, otherwise there's not a lot of meaning in having a parser at all.
(In principle macros can work on tokenized not parsed input like they do in Rust, but that's is a huge change I don't see happening. String macros are the escape hatch to handle the case of "my DSL looks nothing like Julia".)
One idea is that a pattern matcher could go backwards from SemanticAST-like semantic concepts (like "a function definition") backwards to the full set of constructs that could have been used to create it. It could even round-trip, so for example we could take a match against a function declaration, extract the semantic concept ("a function declaration") and then generalize it to all the forms that could have been used to create that.
The idea is you could write a pattern like
FunctionDefinition(foo, [_..., Call(:bar, [args...], _), _...], _)
and then get all function definitions named foo that call bar in one of the positional argument positions without having to worry about if it has kwargs or type arguments or whatever. Then, that could be parsed from
function foo(_..., bar(args...), _...) _ end
and then converted back into the original matching form. This way you can write patterns in terms of the semantic meaning and not the precise syntactic forms.
This approach then makes the patterns independent of the Expr order because the ordering is now semantically defined rather than defined as part of Expr/SyntaxNode itself. The main disadvantage is that there might be some challenges escaping back into the underlying SyntaxNode structure?
The list of matchers should also include
Sure! (Note, IIUC, match.scm
isn't used for much if anything within the Julia codebase.)
Just a small note - I've noticed that match.scm
is used extensively in Base in macroexpand.scm
. (Though it's been argued elsewhere that code is not well-placed, and should be mostly removed and the remainder folded into the lowering resolve-scopes pass)
I wanted to collect some thoughts on tree data structures here. We've been discussing this over at https://github.com/julia-vscode/JuliaWorkspaces.jl/issues/7 but I'd like to have a link to that discussion here. And also a persistent central place to discuss the JuliaSyntax trees.
Trees we have
Green trees
Currently, we've got a
GreenNode
inspired by Roslyn and rust-analyzer. This acts as a raw syntax tree with subtrees which are "reusable" in the sense that they can be spliced into another parent green tree. The main benefit of this reusability would be if we implemented incremental parsing.As discussed in https://github.com/julia-vscode/JuliaWorkspaces.jl/issues/7#issuecomment-1483743325,
GreenNode
being immutable means that the GC pressure of materializing a full green tree is quite low because the leaves are stored inline in the child array of their parent nodes.Some numbers for base/abstractarray.jl which is a moderately large (~100 kb / 3000 line) Julia file:
GreenNode
be immutable.JuliaSyntax.parse!()
costs only 90 allocations.Convenience interface for syntax trivia
GreenNode
isn't very convenient to use. This is a big hole in the API right now, as there's no other way to access the syntax trivia. See also https://github.com/JuliaLang/JuliaSyntax.jl/issues/2We could
GreenNode
more convenient to useSyntaxNode
with an easier way to access trivia childrenAbstract Syntax -
SyntaxNode
We've currently got
SyntaxNode
for this - it's somewhat likeExpr
in the sense thatIn contrast to
Expr
, it has the same strict child source ordering andhead
s asGreenNode
.SyntaxNode
also has aparent
field, but I'm not sure this is worthwhile! A good iterator/cursor interface might be more efficient and convenient?Typed Syntax -
TypedSyntaxNode
The
TypedSyntax
package https://github.com/JuliaDebug/Cthulhu.jl/tree/master/TypedSyntax contains an analog toSyntaxNode
but with a bunch of type annotations and other useful type-related data in theTreeData
. Seehttps://github.com/JuliaDebug/Cthulhu.jl/blob/master/TypedSyntax/src/node.jl
Generalizing tree attributes - trees we want
What's the right way for analysis passes to attach annotations to a tree? One way to do this is by putting them in the tree data structure itself - this is what the parameterized
TreeNode{TreeData}
achieves (used bySyntaxNode
andTypedSyntaxNode
). But this makes it hard to compose different types of passes together without havingTreeData
become fat and unwieldy, or having different passes know about each other.A way around this is to store only an id per tree node and to store attributes in separate arrays or dictionaries indexed by this id. For sparse attributes an entity component system (ECS) setup Overseer.jl is excellent: attributes are the components, compiler passes are the systems, and tree nodes are the entities. @benchung has been trying this approach as noted in https://github.com/julia-vscode/JuliaWorkspaces.jl/issues/7#issuecomment-1483719069
For non-sparse attributes a set of parallel arrays is simpler though: think of a
DataFrame
orTypedTable
where the rows are indexed by node ID and the columns are attributes. If we looked into the Julia graphs ecosystem I'd guess they store their attributes this way.Tree API: traversal and child access
Currently, one traverses a tree with a "big pile of if statements" based on
kind(node)
andchildren(node)
- essentially theExpr
model of working with trees.However, this is pretty cumbersome
kind
snode[3]
- but what does that3
mean? It's different in each different situation.This situation is a mess!
I think the right way to fix this is pattern matching: have a
@match
API which expands to the big pile ofif
statements based onkind
which the user would otherwise have to write by hand. This the approach that rust-analyzer uses (admittedly Rust has language-native pattern matching but we can do it with a macro). It's also roughly the approach that Ben is taking in SemanticAST.jl, though that version still requires knowing the meaning of a given child ordering.