afnanenayet / diffsitter

A tree-sitter based AST difftool to get meaningful semantic diffs
MIT License
1.58k stars 29 forks source link

Support ignoring differences that only consist of variable/function name changes (eg. within minified JavaScript) #819

Open 0xdevalias opened 6 months ago

0xdevalias commented 6 months ago

Is your feature request related to a problem? Please describe.

Currently, when diffing minimized bundled JavaScript code, there's a significant amount of 'noise' due to the bundler often changing the minified variable names between builds. This can obscure the real changes and make the diff output less useful for understanding code changes.

Describe the solution you'd like

I propose adding a feature to diffsitter that ignores changes in variable/function names within minified JavaScript code. This improvement would drastically reduce the noise in diffs of minimized source builds, allowing for a clearer focus on the actual code changes rather than the fluctuation of variable names.

Describe alternatives you've considered As workarounds, I've experimented with various git diff modes like patience, histogram, and minimal to somewhat reduce the diff size. For instance, changing the diff algorithm can alter the number of lines in the diff output significantly:

⇒ git diff --diff-algorithm=default -- unpacked/_next/static/chunks/pages/_app.js | wc -l
  116000

⇒ git diff --diff-algorithm=patience -- unpacked/_next/static/chunks/pages/_app.js | wc -l
   35826

Nonetheless, these approaches still capture variable name changes, which can introduce a substantial amount of 'noise', especially in larger files.

Other potential solutions include pre-processing the files to normalize variable/function names or post-processing the diff output to filter out sections where the only changes involve variable/function names.

Additional context

The ideal solution would provide diff output in text format, but the actual diffing would occur at the AST level, ignoring variable/function name changes.

I suspect this might be possible already (at least to some degree) with the following; though I haven't found any good examples/docs to help explain how to use it better yet:

I'm going to hopefully play around with it a bit more now, but wanted to capture this while it was fresh in my mind.

See Also

afnanenayet commented 5 months ago

So this works well with an idea I had before - allow users to supply tree-sitter queries to filter which nodes can be diffed on. That is general enough that you could filter for/against certain node types and ignore variable names, for example