Auto-fixes - Githubissues

ZedThree commented 1 week ago

Rules should give suggestions for fixes where possible, with a mode to automatically apply them

LiamPattinson commented 4 days ago

I've been playing around and trying to understand how to edit a tree-sitter tree 'in-place'. I think I have a working example:

extern crate tree_sitter;
extern crate tree_sitter_fortran;
use textwrap::dedent;
use tree_sitter::{InputEdit, Node, Point};

/// Utility for printing the tree to terminal
fn print_tree(node: &Node, indent: usize) {
    let mut cursor = node.walk();
    for child in node.named_children(&mut cursor) {
        let range = child.range();
        println!(
            "{}{}, pos={}, byte={}",
            "  ".repeat(indent),
            child.kind(),
            range.start_point,
            range.start_byte,
        );
        print_tree(&child, indent + 1);
    }
}

/// Parse some Fortran code, edit it with something a linter might do, and reparse.
fn main() {
    // Set up tree sitter parser
    let mut parser = tree_sitter::Parser::new();
    parser
        .set_language(&tree_sitter_fortran::LANGUAGE.into())
        .expect("Error loading Fortran grammar");

    // Parse some code
    let source = dedent(
        "
        program myprog
          integer, parameter :: i = 1
          write (*,*) i
        end program myprog
        ",
    );
    let mut tree = parser.parse(source.as_str(), None).unwrap();

    // Print current state of code to the terminal
    println!("{source}\n");
    print_tree(&tree.root_node(), 0);
    println!("\n");

    // Add new line after line 2 to add 'implicit none'
    let new_source = dedent(
        "
        program myprog
          implicit none
          integer, parameter :: i = 1
          write (*,*) i
        end program myprog
        ",
    );
    // First need to sync the old tree to account for new bytes added.
    // We'll be adding the following snippet to the start of the variable_declaration:
    let new_line = "implicit none\n  ";
    // By checking the result of print_tree() on the old tree, we see that the variable
    // declaration begins at position (2,2), byte 18.
    let edit = InputEdit {
        start_byte: 18,
        old_end_byte: 18,
        new_end_byte: 18 + new_line.len(),
        start_position: Point { row: 2, column: 2 },
        old_end_position: Point { row: 2, column: 2 },
        new_end_position: Point { row: 3, column: 2 },
    };
    tree.edit(&edit);

    // Print the edited tree and check that the bytes and positions have been updated.
    println!("Edited:");
    print_tree(&tree.root_node(), 0);
    println!("\n");

    // Parse new code, using old tree as starting point
    tree = parser.parse(new_source.as_str(), Some(&tree)).unwrap();

    // Check that it's the same as the last tree, only with a new implicit_statement node
    println!("{new_source}\n");
    print_tree(&tree.root_node(), 0);
    println!("\n");
}

I compiled this with the following Cargo.toml:

[package]
name = "tree_sitter_edits"
version = "0.1.0"
edition = "2021"

[dependencies]
textwrap = "0.16.0"
tree-sitter = "~0.23.0"
tree-sitter-fortran = "0.1.0"

There are some outstanding questions I have:

How do we edit the source code in a general manner? Is there some way we can use the start_byte, old_end_byte and new_end_byte to efficiently insert new code into the source?
In cases like this where we're adding new lines, how do we keep indentation consistent?
When running in check --fix mode, how do we iterate over the tree? I imagine we would have to do something like:
- Search the tree until finding a fixable violation.
- Apply the fix: write new source code, build a new tree.
- Search the new tree starting again from the root node.
- Continue until no fixable violations are found.
- Perform a final pass to collect all non-fixable violations.
We could reduce the number of passes we would need to perform by collecting non-overlapping diffs during each pass, applying them all at once, and then repeating until no fixable violations are found. That algorithm could be quite complicated though.

ZedThree commented 4 days ago

Nice! That's a good start.

We should look at the two really good bits of prior art, ruff and clang-tidy, and steal their algorithm(s). Ruff's looks to be in lint_fix in ruff_linter/src/linters.rs. Clang stuff is a lot more complex, but this looks like a reasonable place to start: https://clang.llvm.org/docs/ClangTransformerTutorial.html

It does seem like ruff follows your suggested approach: repeatedly applying fixes and reparsing until the result is stable (or it fails).

PlasmaFAIR / fortitude

Auto-fixes #70