ftilmann / latexdiff

Compares two latex files and marks up significant differences between them. Releases on www.ctan.org and mirrors
GNU General Public License v3.0
506 stars 72 forks source link

latexdiff messes up spaces after periods #269

Closed jameswhqi closed 2 years ago

jameswhqi commented 2 years ago

latexdiff doesn't seem to distinguish between periods followed by spaces/line breaks (used to end a sentence) and periods not followed by spaces/line breaks (as in "i.e., something"). If these two different kinds of periods are matched between the old file and new file, the spacing of the old material will be messed up (I guess because this is treated by latexdiff as "insignificant differences").


MWE:

one.tex

i.e., something

two.tex

one. two.
three.

latexdiff one.tex two.tex

\DIFdelbegin \DIFdel{i. e.
, something }\DIFdelend \DIFaddbegin \DIFadd{one. two.
three. }\DIFaddend

Expected output:

\DIFdelbegin \DIFdel{i.e., something }\DIFdelend \DIFaddbegin \DIFadd{one. two.
three. }\DIFaddend

latexdiff two.tex one.tex

\DIFdelbegin \DIFdel{one.two.three. }\DIFdelend \DIFaddbegin \DIFadd{i.e., something }\DIFaddend

Expected output:

\DIFdelbegin \DIFdel{one. two.
three. }\DIFdelend \DIFaddbegin \DIFadd{i.e., something }\DIFaddend

latexdiff --version

This is LATEXDIFF 1.3.2 (Algorithm::Diff 1.15 so, Perl v5.34.0)
  (c) 2004-2021 F J Tilmann
ftilmann commented 2 years ago

Your assumption on why this is going wrong is correct. It's a somewhat pathological case but I appreciate that it's probably not extremely uncommon. You can force the correct behaviour with --config MINWORDSBLOCK=0 but this will very likely have undesirable effects in longer text. A proper fix is actually not trivial at all, and instead I have hardcoded some common abbreviations (i.e. -- e.g. -- z.B.; the last one occurs in German texts) to be treated atomically. It's kind of ugly but works, and the list is easily extensible in the source code, but currently not configurable).