DataDog / datadog-static-analyzer

Datadog Static Analyzer
https://docs.datadoghq.com/static_analysis/
Apache License 2.0
100 stars 12 forks source link

[STAL-1960] Implement "field name" for tree-sitter node children #424

Closed jasonforal closed 3 months ago

jasonforal commented 3 months ago

What problem are you trying to solve?

Rules in the stella runtime can access the field name of a child node. This is currently unimplemented in ddsa.

What is your solution?

Primer on field names

A tree-sitter node that is a child can be referred to be a field name, however, it's important to clarify that it is not "the node's field name" -- it is "a field name linked to a node (only because it's the child of another)".

Not every child has a field name: For the given JavaScript, we have the tree:

function echo(a, b, c) {
    // Implementation here
}
image

In this example, the (function_declaration) node has three children, and they all have field names associated with them:

The (formal_parameters) node likewise has three children, but none of them have field names

Serialization

ddsa is designed around passing around smis, and this use case is no different.

When we fetch the children of a node, we now need to pass in an additional id so that a rule can look up its string name. This is done by passing a tuple for each child, where it is now (NodeId, FieldId) instead of just NodeId. tree-sitter itself uses one-based field ids, so we can conveniently use 0 to indicate the absence of a field name.

(The contents of the Uint32Array that is sent to v8)

|             Node A              |             Node B              |
|    NodeId A    |    FieldId A   |    NodeId B    |    FieldId B   |
 ________________ ________________ ________________ ________________
|                |                |                |                |
0                1                2                3
     32 bits          32 bits          32 bits          32 bits

Deserialization

When a child has a field name, we effectively "wrap" the TreeSitterNode with another object (a newly-introduced class) that provides the field name. To callers, this has the exact same interface as a TreeSitterNode, but it additionally contains fieldName. For a visual representation, when console.logged:

console.log(tsNode);
// {"type":"identifier","start":{"line":1,"col":15},"end":{"line":1,"col":16},"text":"a"}
console.log(childNode);
// {"type":"identifier","fieldName":"name","start":{"line":1,"col":10},"end":{"line":1,"col":14},"text":"echo"}

Getting field ids

In order to performantly access the field ids, we need to perform child lookup manually using the cursor API instead of the provided iterators. What we end up doing here is copy/pasting the binding's iterator implementation, but adding a call to cursor.field_id() where required.

Alternatives considered

What the reviewer should know