Open nobodyinperson opened 2 years ago
I have been thinking about this also, and miss this functionality myself but I expect it is relatively difficult to implement, particularly in the current framework and there are some subtleties involved in the definition of desired behaviour. Do you know which algorithm is used to make the detection of moving as opposed to deleting and adding. What happens if in addition to moving a small change is made in the moved block? Does this then get highlighted as an addition or deletion within the moved block, or does this somehow interfere with the whole block then no longer being recognised. Is there a minimum size for the block moved?
Do you know which algorithm is used to make the detection of moving as opposed to deleting and adding.
I browsed through the git diff
source code and quickly got lost... 😅
Python's difflib
also doesn't seem to have this functionality, which would have been a nice starting point...
What happens if in addition to moving a small change is made in the moved block? Does this then get highlighted as an addition or deletion within the moved block, or does this somehow interfere with the whole block then no longer being recognised. Is there a minimum size for the block moved?
Good questions. The easy answer would be to „just make it configurable” and use sane defaults.
Off the top of my head I would introduce an absolute and a relative moving threshold. The absolute threshold would mean: „Don't consider sequences of 20 characters or less for moving”. The relative threshold would say: „If less than 10% of a moved block was also changed, still consider the whole block 'moved' and color the differences accordingly”.
Thank you for looking into this so quickly. It will be an interesting feature but I will need a block of time to think about this and implement something, and those 'blocks of time' are hard to come by these days.
'blocks of time' are hard to come by these days.
Absolutely. No pressure, I just wanted to put this idea here so it is out there.
Just wanted to drop by and +1 to the idea of coloring block moves differently. It sounds like that will be non-trivial to do, so I'm not trying to add to the time pressure. But I wanted to fill in a couple of details that hopefully will make the process easier in the future.
After playing around with getting this to work on the commandline for standard difftools, indeed
git diff --color-moved=zebra oldfile newfile
or
git diff --color-moved=plain oldfile newfile
(removes the 20 character minimum block size) both get the job done. Here's the relevant section of the git diff docs which answers a lot of your detail questions and could be a starting point for various design decisions.
As mentioned in Detecting moved sections #162, the core algorithm seems to be the Heckel diff algorithm described on the page for wikEd diff Implementation. See also Paul Heckel: A technique for isolating differences between files Communications of the ACM 21(4):264 (1978). Here's an in-browser demo to try for the wikEd implementation.
Here's a couple of python implementations: m-matelski/mdiff, lahwaacz/python-wikeddiff, and a Stack Overflow discussion which might be helpful: Difficulty understanding Paul Heckel's Diff Algorithm.
Thanks again for working on this tool, and hope these resources will be helpful at some time in the future
git diff --color-moved=zebra
colors moved lines differently so they don't show up as a huge amount of removed and added lines. Havinglatexdiff
also color moved passages differently (e.g. in darkgreen) would add a lot of value to the output.