Closed GoogleCodeExporter closed 8 years ago
I agree. Thanks for bringing this up (and pinging me by email as requested).
This would be an issue with diff_cleanupSemanticLossless which does currently
slide
differences to line up with whitespace, but doesn't rank line breaks as having
a
higher value than a space.
There would seem to be five things to attempt to slide edit boundaries to (in
order
of preference):
1. Start/End of entire text.
2. Blank lines.
3. Line breaks.
4. Whitespace.
5. Non alpha-numeric.
I'll get on this. If you want you can let me know what language you are using
(C++,
JavaScript, Java, Python) and I'll do that one first and send you a prerelease
copy
before I get the others updated. (Yay for 20% time at Google.)
Original comment by neil.fra...@gmail.com
on 20 Jun 2008 at 10:34
I just managed to get what I expected by changing a single character in the
source code :-)
I'm working with the javascript version.
In function diff_cleanupSemanticScore(), I changed :
var whitespace = /(\s)/;
var whitespace = /(\n)/;
Of course the clean up obviously no longer works with words, just lines, but
that's a start.
Original comment by dpere...@gmail.com
on 20 Jun 2008 at 10:35
Aye, a quick and dirty (and 90% effective) solution is to use the following:
function diff_cleanupSemanticScore(one, two, three) {
if (!one || !three) {
// Edges are the best.
return 10;
}
var score = 0;
var whitespace = /\s/;
if (one.charAt(one.length - 1).match(whitespace) ||
two.charAt(0).match(whitespace)) {
score++;
}
if (two.charAt(two.length - 1).match(whitespace) ||
three.charAt(0).match(whitespace)) {
score++;
}
var linebreak = /[\r\n]/;
if (one.charAt(one.length - 1).match(linebreak ) ||
two.charAt(0).match(linebreak )) {
score++;
}
if (two.charAt(two.length - 1).match(linebreak ) ||
three.charAt(0).match(linebreak )) {
score++;
}
return score;
}
This probably even passes the existing unit tests.
Original comment by neil.fra...@gmail.com
on 20 Jun 2008 at 10:42
Hehe, you're fast ;-)
Original comment by dpere...@gmail.com
on 20 Jun 2008 at 10:51
What about :
var linebreak = /[\r*\n]/;
Instead of
var linebreak = /[\r\n]/;
Original comment by dpere...@gmail.com
on 20 Jun 2008 at 10:55
A new version of diff, match, patch has just been posted in all four languages
which
solves this problem thoroughly using the five-point list in comment 1.
Original comment by neil.fra...@gmail.com
on 25 Jun 2008 at 12:22
Fixed?
then try:
diff_match_patch_20110217
-------------------------------left:
/**
* init。
*
*/
public void init() {
this.start();
}
-------------------------------right:
/**
* init。
*
* @throws InterruptedException
* @throws SQLException
*
*/
@SuppressWarnings("unused")
private void init() throws SQLException, InterruptedException {
this.start();
}
Original comment by btp...@gmail.com
on 24 Mar 2011 at 7:05
The current behaviour is correct. I assume you are using the cleanupSemantic
function? In that case you get a diff with the following sequence:
EQUAL, DELETION, INSERTION, EQUAL
And I assume that you are expecting that the " */" line should be preserved
(which it currently is not). If that were the case, then the diff would have
the following sequence:
EQUAL, INSERTION, EQUAL(" */"), DELETION, INSERTION, EQUAL
This is a more complicated diff sequence, just to save three characters (two of
which match well on an earlier line).
Any change to make diff preserve this line at the expense of a more complicated
diff sequence would also cause it to preserve coincidental matches as well --
three characters is a pretty small match.
Original comment by neil.fra...@gmail.com
on 24 Mar 2011 at 5:45
Original issue reported on code.google.com by
dpere...@gmail.com
on 20 Jun 2008 at 9:35