Detecting moved sections

ftilmann / latexdiff

Compares two latex files and marks up significant differences between them. Releases on www.ctan.org and mirrors

GNU General Public License v3.0

529 stars 75 forks source link

Detecting moved sections #162

Open briochemc opened 6 years ago

briochemc commented 6 years ago

Is there a way to tell latex-diff to figure out when whole sections are moved around?

ftilmann commented 5 years ago

I thought about this already for a long time but quite difficult to do in a useful way as unlike in Word there is no access to the editing process (which can sometimes be a good thing), and if, for example, a whole paragraph was moved, and then one or two words changed, it should still appear as a moved paragraph with some edits.

So for now probably not feasible, unfortunately.

flying-sheep commented 4 years ago

It could work on a per-paragraph basis, trying to find 1:1 mappings of the closest corresponding paragraphs and calculating the differences between them.

ftilmann commented 4 years ago

Thanks for the suggestion. Still not so quick to do in practice (or do you know of an algorithm implemented in perl that does fuzzy differencing of tokenized text?). I have another idea how one could 'fake' such a functionality by looking for exact matches for added/deleted blocks of a certain length, which would probably work in many instances, but even implementing this requires changing several parts of the very core of latexdiff. So not something I will undertake any time very soon

flying-sheep commented 4 years ago

do you know of an algorithm implemented in perl that does fuzzy differencing of tokenized text?

Sorry, I looked into Perl once in 2009 and decided to learn Python instead.

Doing what you said won’t be any more or less fake than what any diff tool does, they’re heuristics by necessity.

apYdr6uxv commented 4 years ago

FWIW Found one implemented in JS here. Apparently the algorithm is called the Heckel method; read more here.

Or am I off-track?

ftilmann commented 4 years ago

Thanks for leaving these hints. It looks like a promising approach but would replace the current diffing algorithm (at least optionally) and thus require quite a lot of coding to implement within the latexdiff context.

apYdr6uxv commented 4 years ago

Of course! I did not mean to imply it makes it any easier. 🙇