Open ZedThree opened 1 month ago
I've been playing around and trying to understand how to edit a tree-sitter tree 'in-place'. I think I have a working example:
extern crate tree_sitter;
extern crate tree_sitter_fortran;
use textwrap::dedent;
use tree_sitter::{InputEdit, Node, Point};
/// Utility for printing the tree to terminal
fn print_tree(node: &Node, indent: usize) {
let mut cursor = node.walk();
for child in node.named_children(&mut cursor) {
let range = child.range();
println!(
"{}{}, pos={}, byte={}",
" ".repeat(indent),
child.kind(),
range.start_point,
range.start_byte,
);
print_tree(&child, indent + 1);
}
}
/// Parse some Fortran code, edit it with something a linter might do, and reparse.
fn main() {
// Set up tree sitter parser
let mut parser = tree_sitter::Parser::new();
parser
.set_language(&tree_sitter_fortran::LANGUAGE.into())
.expect("Error loading Fortran grammar");
// Parse some code
let source = dedent(
"
program myprog
integer, parameter :: i = 1
write (*,*) i
end program myprog
",
);
let mut tree = parser.parse(source.as_str(), None).unwrap();
// Print current state of code to the terminal
println!("{source}\n");
print_tree(&tree.root_node(), 0);
println!("\n");
// Add new line after line 2 to add 'implicit none'
let new_source = dedent(
"
program myprog
implicit none
integer, parameter :: i = 1
write (*,*) i
end program myprog
",
);
// First need to sync the old tree to account for new bytes added.
// We'll be adding the following snippet to the start of the variable_declaration:
let new_line = "implicit none\n ";
// By checking the result of print_tree() on the old tree, we see that the variable
// declaration begins at position (2,2), byte 18.
let edit = InputEdit {
start_byte: 18,
old_end_byte: 18,
new_end_byte: 18 + new_line.len(),
start_position: Point { row: 2, column: 2 },
old_end_position: Point { row: 2, column: 2 },
new_end_position: Point { row: 3, column: 2 },
};
tree.edit(&edit);
// Print the edited tree and check that the bytes and positions have been updated.
println!("Edited:");
print_tree(&tree.root_node(), 0);
println!("\n");
// Parse new code, using old tree as starting point
tree = parser.parse(new_source.as_str(), Some(&tree)).unwrap();
// Check that it's the same as the last tree, only with a new implicit_statement node
println!("{new_source}\n");
print_tree(&tree.root_node(), 0);
println!("\n");
}
I compiled this with the following Cargo.toml
:
[package]
name = "tree_sitter_edits"
version = "0.1.0"
edition = "2021"
[dependencies]
textwrap = "0.16.0"
tree-sitter = "~0.23.0"
tree-sitter-fortran = "0.1.0"
There are some outstanding questions I have:
start_byte
, old_end_byte
and new_end_byte
to efficiently insert new code into the source?When running in check --fix
mode, how do we iterate over the tree? I imagine we would have to do something like:
We could reduce the number of passes we would need to perform by collecting non-overlapping diffs during each pass, applying them all at once, and then repeating until no fixable violations are found. That algorithm could be quite complicated though.
Nice! That's a good start.
We should look at the two really good bits of prior art, ruff and clang-tidy, and steal their algorithm(s). Ruff's looks to be in lint_fix
in ruff_linter/src/linters.rs
. Clang stuff is a lot more complex, but this looks like a reasonable place to start: https://clang.llvm.org/docs/ClangTransformerTutorial.html
It does seem like ruff follows your suggested approach: repeatedly applying fixes and reparsing until the result is stable (or it fails).
Rules should give suggestions for fixes where possible, with a mode to automatically apply them