ftilmann / latexdiff

Compares two latex files and marks up significant differences between them. Releases on www.ctan.org and mirrors
GNU General Public License v3.0
506 stars 72 forks source link

tabular diff garbled when same item is added first in on column and deleted from later column #281

Open hernot opened 1 year ago

hernot commented 1 year ago

I just observed that in a simple number table one of the rows is garbeled. The delete and add entries shifted to the right. Looking at the resulting dif in this row the new number is added \DIFaddbeginFL \DIFaddFL but the \DIFaddendFL is placed after the&and the\DIFdelbegin \DIFdel \DIFdelend` sequence is shifted even one column further. Seems as if splitting in multiple columns is not working properly in this line. Diff output improper or badly parsed?

See attached examples. These are plain simple and straight tabular tables no fancy multicolumn or multirow.

The affected row is row three with Subject ID 4 in first column. garbeled-tabular.zip

The diff is from the same projects as the issue #280 and thus is generated the method described there. Have verified that it is not the used filter which distorts the content.

The Table currently looks like as follows.

image

As can be seen the 1.8 is added as should be but the 2.2 which it should replace by it appears as unchanged on the next column, where the 2.5 should be replaced by the 2.2 as not changed. And from thereon everything to be deleted is shifted by 1 column. The same happens in the third last column again with the 1.9 which should be replaced by the 1.5 two columns earlier and the now two shifted deletes are inserted in the second last column such that it now has two deleted number and a not changed number which is true.

All others rows show addition and deletion correctly.

EDIT The whole looks like if tabluar or in general environments where & is used as column / alignment separator splitting into individual sub rows to be processed individually and merged/joined with an intermittent & afterwards is not working properly or not forseen as such. Might be difficult in case added columns / rows when at the same time the adjacent ones have in addition modified content even in head lines or more complex stuff. Will not be uniqely solvable in this case. Especially as multicolumn counts for k columns. And when multi row comes into play even worse.

Not sure if a there exists a solution which also helps in other issues like #72 or #5 but maybe a combination of splitting at & processing each split as individual row and counting embedded multicolumns in combination with a `%DIFcomplex table comment by which authors can indicate which column is added and which has vanished if not reliably guessable from inputs. And if this does not help either implicitly falling back to #275 could help at least to mitigate and relax situation.

EDIT 2 Latexdiff rather like latex thinks in boxes. Tried whether i could figure my self how to make latexdiff split content at & into individual sub blocks. Sadly there are so many obstacles, amongst others very inconsistent mix of tab and white space indenting with different replacements ranging from 2 to 8 white spaces per tab, which make it really hard to understand the code even for somebody who just needs to brush up and modernize his perl knowledge. I gave up. Sorry for being not much more help. Maybe i give the filter idea a retry.

EDIT 3 Tried now to pre filter such that content betwee & is enclosed in mbox command or replacing & by command. No change. Also tried with version from github master. No change eigther. I meanwhile rather suspect pass2 to garble and mix-up rather than tokenization in pass1. I do give up and resort to manually edit table diff before creating submission build.