Currently, the JavaScript runtime does not have any ability to inspect or operate on the tree-sitter tree powering the analysis.
To partially address this, we currently preemptively serialize children and send them along with the captured nodes. This introduces two main inefficiencies:
Not every rule needs to inspect children
Less than 25% of our current rules access the children. And of those that do, nearly 1/3 of them are only querying the number of children, which means they don't need the actual serialized node.
We serialize children recursively
It would be unexpected for a user if they couldn't write a rule that accesses the children of a child, the children of that...and so on. So to address this, we currently serialize children recursively. This leads to v8 footguns, where a small change to a query's captures can reduce performance of a rule by an order of magnitude. For example, let's say a rule wants to count the number of functions within a Java class.
function visit(query, filename, code) {
const { body } = query.captures;
if (body.children.length > 10) { /* ... */ }
}
We currently serialize 395 nodes--almost the entire tree--for a rule that only needs to count the number of children.
What is your solution?
In prior PRs, we implemented all the plumbing necessary for a JavaScript rule to perform operations on the tree-sitter tree. This PR has two goals:
Add function parity with the existing runtime by allowing rules to access a node's children.
Provide a reference example for contributors for how they can implement Rust hooks to expose the tree-sitter tree to rules.
This example shows the efficiency of the ddsa runtime's lazy serialization: for the above rule, we only serialize 3 extra nodes instead of 395. (sidenote: this could be further optimized in a future PR to 0 by providing an API to request count instead of actual nodes).
What problem are you trying to solve?
Currently, the JavaScript runtime does not have any ability to inspect or operate on the tree-sitter tree powering the analysis.
To partially address this, we currently preemptively serialize children and send them along with the captured nodes. This introduces two main inefficiencies:
Not every rule needs to inspect children Less than 25% of our current rules access the children. And of those that do, nearly 1/3 of them are only querying the number of children, which means they don't need the actual serialized node.
We serialize children recursively It would be unexpected for a user if they couldn't write a rule that accesses the children of a child, the children of that...and so on. So to address this, we currently serialize children recursively. This leads to v8 footguns, where a small change to a query's captures can reduce performance of a rule by an order of magnitude. For example, let's say a rule wants to count the number of functions within a Java class.
Given a sample file: BenchmarkTest00012.java, and a query:
And a rule:
We currently serialize 395 nodes--almost the entire tree--for a rule that only needs to count the number of children.
What is your solution?
In prior PRs, we implemented all the plumbing necessary for a JavaScript rule to perform operations on the tree-sitter tree. This PR has two goals:
This example shows the efficiency of the ddsa runtime's lazy serialization: for the above rule, we only serialize 3 extra nodes instead of 395. (sidenote: this could be further optimized in a future PR to 0 by providing an API to request count instead of actual nodes).
Alternatives considered
What the reviewer should know