ftilmann / latexdiff

Compares two latex files and marks up significant differences between them. Releases on www.ctan.org and mirrors
GNU General Public License v3.0
506 stars 72 forks source link

🎨 Color moved passages differently #259

Open nobodyinperson opened 2 years ago

nobodyinperson commented 2 years ago

git diff --color-moved=zebra colors moved lines differently so they don't show up as a huge amount of removed and added lines. Having latexdiff also color moved passages differently (e.g. in darkgreen) would add a lot of value to the output.

ftilmann commented 2 years ago

I have been thinking about this also, and miss this functionality myself but I expect it is relatively difficult to implement, particularly in the current framework and there are some subtleties involved in the definition of desired behaviour. Do you know which algorithm is used to make the detection of moving as opposed to deleting and adding. What happens if in addition to moving a small change is made in the moved block? Does this then get highlighted as an addition or deletion within the moved block, or does this somehow interfere with the whole block then no longer being recognised. Is there a minimum size for the block moved?

nobodyinperson commented 2 years ago

Do you know which algorithm is used to make the detection of moving as opposed to deleting and adding.

I browsed through the git diff source code and quickly got lost... 😅

Python's difflib also doesn't seem to have this functionality, which would have been a nice starting point...

What happens if in addition to moving a small change is made in the moved block? Does this then get highlighted as an addition or deletion within the moved block, or does this somehow interfere with the whole block then no longer being recognised. Is there a minimum size for the block moved?

Good questions. The easy answer would be to „just make it configurable” and use sane defaults.

Off the top of my head I would introduce an absolute and a relative moving threshold. The absolute threshold would mean: „Don't consider sequences of 20 characters or less for moving”. The relative threshold would say: „If less than 10% of a moved block was also changed, still consider the whole block 'moved' and color the differences accordingly”.

ftilmann commented 2 years ago

Thank you for looking into this so quickly. It will be an interesting feature but I will need a block of time to think about this and implement something, and those 'blocks of time' are hard to come by these days.

nobodyinperson commented 2 years ago

'blocks of time' are hard to come by these days.

Absolutely. No pressure, I just wanted to put this idea here so it is out there.

awillats commented 1 year ago

Just wanted to drop by and +1 to the idea of coloring block moves differently. It sounds like that will be non-trivial to do, so I'm not trying to add to the time pressure. But I wanted to fill in a couple of details that hopefully will make the process easier in the future.

After playing around with getting this to work on the commandline for standard difftools, indeed git diff --color-moved=zebra oldfile newfile or git diff --color-moved=plain oldfile newfile (removes the 20 character minimum block size) both get the job done. Here's the relevant section of the git diff docs which answers a lot of your detail questions and could be a starting point for various design decisions.

As mentioned in Detecting moved sections #162, the core algorithm seems to be the Heckel diff algorithm described on the page for wikEd diff Implementation. See also Paul Heckel: A technique for isolating differences between files Communications of the ACM 21(4):264 (1978). Here's an in-browser demo to try for the wikEd implementation.

Here's a couple of python implementations: m-matelski/mdiff, lahwaacz/python-wikeddiff, and a Stack Overflow discussion which might be helpful: Difficulty understanding Paul Heckel's Diff Algorithm.

Thanks again for working on this tool, and hope these resources will be helpful at some time in the future