Wilfred / difftastic

a structural diff that understands syntax 🟥🟩
https://difftastic.wilfred.me.uk/
MIT License
21.13k stars 345 forks source link

Support ignoring differences that only consist of variable/function name changes (eg. within minified JavaScript) #631

Open 0xdevalias opened 9 months ago

0xdevalias commented 9 months ago

Currently, when diffing minimized bundled JavaScript code, there's a significant amount of 'noise' due to the bundler often changing the minified variable names between builds. This can obscure the real changes and make the diff output less useful for understanding code changes.

Proposed Feature

I would like to suggest the implementation of a feature in difftastic that can ignore changes in variable/function names in minified JavaScript code. This feature would be immensely helpful in reducing the noise in diffs of minimized source builds, focusing on the actual code changes rather than variable name churn.

Current Workarounds / Limitations

Currently, alternative git diff modes like patience, histogram, and minimal can be used to reduce the size of the diff somewhat. For example:

⇒ git diff --diff-algorithm=default -- unpacked/_next/static/chunks/pages/_app.js | wc -l
  116000

⇒ git diff --diff-algorithm=patience -- unpacked/_next/static/chunks/pages/_app.js | wc -l
   35826

However, these methods still include variable name changes in their output, which can often lead to quite significant 'noise', particularly on larger files.

Other potential workarounds involve pre-processing the files to standardize their variable/function names or post-processing the diff output to detect and suppress chunks where the only changes are in variable/function names.

Expected Outcome

The ideal solution would provide diff output in text format, but the actual diffing would occur at the AST level, ignoring variable/function name changes.

See Also

0xdevalias commented 9 months ago

Originally I thought that diffsitter might be a good answer to this need, but after playing with it, the diff output is fairly subpar compared to basic git diff/etc; and it's performance seemed SUPER slow on a large file (~12.29min compared to seemingly 6.746sec for difftastic (though with a lot of (8.39 MiB exceeded DFT_BYTE_LIMIT) in the output))

You can see further context/screenshots/etc of the output here:


Edit: difftastic might not be a good fit either if I can't figure out how to get these parsing issues fixed up:

Edit: It seems when DFT_BYTE_LIMIT is exceeded difftastic falls back to a text diff, so that's not really a fair time comparison:

I tried overriding that in my .gitconfig:

# https://github.com/Wilfred/difftastic
[difftool "difftastic"]
  cmd = difft --byte-limit 20971520 "$LOCAL" "$REMOTE"

And then running it again, but then I just got a different set of errors:

 ⇒ time git difftool --tool difftastic HEAD~1 HEAD -- unpacked/_next/static/chunks/pages/_app.js | subl
git difftool --tool difftastic HEAD~1 HEAD --   12.42s user 1.10s system 79% cpu 17.043 total
subl  0.01s user 0.02s system 0% cpu 17.248 total
_app.js --- 1/674 --- Text (2 JavaScript parse errors, exceeded DFT_PARSE_ERROR_LIMIT)

Originally posted by @0xdevalias in https://github.com/afnanenayet/diffsitter/issues/149#issuecomment-1916248479