curran / google-diff-match-patch

Automatically exported from code.google.com/p/google-diff-match-patch
Apache License 2.0
17 stars 2 forks source link

diff_cleanupSemantic doesn't always cleanup #11

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Usually diff_cleanupSemantic does an excellent job :-).
However, in this case it appears to have a problem.

What steps will reproduce the problem?
In JavaScript (with default settings):
var a='\tS += "</table><pre style=\'display:none\'>";\n'+
  '\tS += text.replace(/>/g, \'&gt;\');\n'+
  '\tS += "</pre></li></ul></div>\\n";';
var b='\n'+'\tt = lines.join(\'\\n\').replace(/>/g, \'&gt;\');\n'+
  '\tS += "</table><pre style=\'display:none\'>".concat(t, "</pre></li></ul></div>\\n");';
var DMP = new diff_match_patch;
var d=DMP.diff_main(a,b);
DMP.diff_cleanupSemantic(d);
for (var i in d) print('{',d[i][0],', "',d[i][1],'"}');

The output is as follows (which is basically the same as with no cleanup):
{ -1 , "    S "}
{ 1 , " 
    t "}
{ 0 , "   "}
{ -1 , " + "}
{ 0 , " =  "}
{ -1 , " "</tab "}
{ 0 , " l "}
{ 1 , " in "}
{ 0 , " e "}
{ -1 , " ><pre  "}
{ 0 , " s "}
{ -1 , " tyle='d "}
{ 1 , " .jo "}
{ 0 , " i "}
{ -1 , " splay: "}
{ 0 , " n "}
{ -1 , " o "}
{ 1 , " ('\ "}
{ 0 , " n "}
{ -1 , " e "}
{ 0 , " ' "}
{ -1 , " >";
    S += text "}
{ 1 , " ) "}
{ 0 , " .replace(/>/g, '&gt;');
    S +=  "}
{ 1 , " "</table><pre style='display:none'>".concat(t,  "}
{ 0 , " "</pre></li></ul></div>\n" "}
{ 1 , " ) "}
{ 0 , " ; "}

What version of the product are you using? On what operating system?
diff_match_patch_20080520.zip on Mac OS X 10.4.7

Please provide any additional information below.
Deleting the '\n' at the beginning of var b reduces the diff from 31 to 9 
elements. As follows:
{ 0 , "      "}
{ -1 , " S += "</table><pre style='display:none'>";
    S += text.replace(/>/g, '&gt;');
    S += "}
{ 1 , " t = lines.join('\n').replace(/>/g, '&gt;');
    S += "</table><pre style='display:none'>".concat(t, "}
{ 0 , "  "</pre></li></ul></div>\n" "}
{ 1 , " ) "}
{ 0 , " ; "}

Original issue reported on code.google.com by TheDoc...@nerdshack.com on 25 May 2008 at 12:42

GoogleCodeExporter commented 9 years ago
Yes, you are right!  diff_cleanupSemantic and diff_cleanupEfficiency bail out 
after
the cleanup of a single term if that term is right at the beginning.  Only 
affects
the JavaScript version of the library.

I have fixed these two bugs, added unit tests and uploaded a fresh copy of the
library.  Thank you!

Original comment by neil.fra...@gmail.com on 27 May 2008 at 9:40