DataDog / datadog-static-analyzer

Datadog Static Analyzer
https://docs.datadoghq.com/static_analysis/
Apache License 2.0
100 stars 12 forks source link

[STAL-1960] Add ability to fetch tree-sitter node children from JavaScript #415

Closed jasonforal closed 3 months ago

jasonforal commented 3 months ago

What problem are you trying to solve?

Currently, the JavaScript runtime does not have any ability to inspect or operate on the tree-sitter tree powering the analysis.

To partially address this, we currently preemptively serialize children and send them along with the captured nodes. This introduces two main inefficiencies:

Not every rule needs to inspect children Less than 25% of our current rules access the children. And of those that do, nearly 1/3 of them are only querying the number of children, which means they don't need the actual serialized node.

We serialize children recursively It would be unexpected for a user if they couldn't write a rule that accesses the children of a child, the children of that...and so on. So to address this, we currently serialize children recursively. This leads to v8 footguns, where a small change to a query's captures can reduce performance of a rule by an order of magnitude. For example, let's say a rule wants to count the number of functions within a Java class.

Given a sample file: BenchmarkTest00012.java, and a query:

(class_declaration
    (class_body) @class_body
)

And a rule:

function visit(query, filename, code) {
  const { body } = query.captures;
  if (body.children.length > 10) { /* ... */ }
}

We currently serialize 395 nodes--almost the entire tree--for a rule that only needs to count the number of children.

What is your solution?

In prior PRs, we implemented all the plumbing necessary for a JavaScript rule to perform operations on the tree-sitter tree. This PR has two goals:

  1. Add function parity with the existing runtime by allowing rules to access a node's children.
  2. Provide a reference example for contributors for how they can implement Rust hooks to expose the tree-sitter tree to rules.

This example shows the efficiency of the ddsa runtime's lazy serialization: for the above rule, we only serialize 3 extra nodes instead of 395. (sidenote: this could be further optimized in a future PR to 0 by providing an API to request count instead of actual nodes).

Alternatives considered

What the reviewer should know

jasonforal commented 3 months ago

Rebased -- only change is on the serialization test because https://github.com/DataDog/datadog-static-analyzer/pull/407 changed how TreeSitterNode instances are console logged