afnanenayet / diffsitter

A tree-sitter based AST difftool to get meaningful semantic diffs
MIT License
1.58k stars 29 forks source link

[BUG] `"exclude-kinds": ["string"]` does not work for Python #840

Closed ya0guang closed 4 months ago

ya0guang commented 4 months ago

Describe the bug I found that "exclude-kinds": ["string"] does not work for Python strings (e.g., doc string and strings in assignments). Please correct me if my configuration is incorrect.

To Reproduce

a = "b"
a = "a"

Comparing them with the config file to exclude strings

Expected behavior

No diff should be found.

Log output/screenshots

=================================================================

0:
--
+a = "a"

0:
--
-a = "b"

Platform: OS: Linux

Additional context

After diving deep into the source code, I tried to add debug output in function should_include_node at input_processing.rs. I add a line to print the kind out:

        debug!("node: kind: {:?}", node.kind());

The log looks like this:

 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "identifier"
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "="
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "string_start"
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "string_content"
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "string_end"
 2024-03-05T04:23:16.289Z INFO  TimerFinished                   > ast::process(), Elapsed=54.622µs
 2024-03-05T04:23:16.289Z INFO  TimerFinished                   > ast::from_ts_tree(), Elapsed=5.999µs
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "identifier"
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "="
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "string_start"
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "string_content"
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::input_processing > node: kind: "string_end"
 2024-03-05T04:23:16.289Z INFO  TimerFinished                   > ast::process(), Elapsed=35.604µs
 2024-03-05T04:23:16.289Z INFO  TimerFinished                   > diff::compute_edit_script(), Elapsed=12.949µs
 2024-03-05T04:23:16.289Z INFO  libdiffsitter::render::unified  > Using stack style vertical for title
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > Printing hunk (lines 0 - 0)
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > Title string has length of 3
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > Printing line 0
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > End line 0
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > End hunk (lines 0 - 0)
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > Printing hunk (lines 0 - 0)
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > Title string has length of 3
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > Printing line 0
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > End line 0
 2024-03-05T04:23:16.289Z DEBUG libdiffsitter::render::unified  > End hunk (lines 0 - 0)

It looks like there is no kind string for the nodes, and changing string to string_content in the config file leads me to the expected output. I'm not sure if this is the expected behavior, but I assume it's not, as the README specifies string to be excluded.

Please let me know if further information is needed. Thanks!

afnanenayet commented 4 months ago

The readme is probably out of date then, sometimes tree sitter updates change node types. Diffsitter is just checking whether the string you pass matches the node type string as reported by the tree sitter library, which can be annoying because of stuff like this but the alternative would be for me to maintain my own mappings/aliases which I think would be a lot of maintenance burden.

I'll update the docs to make a note of this