Closed GoogleCodeExporter closed 8 years ago
Word-by-word diffing is not a native function of this library. However, it is
easy
to do. You need to break your text into words (how you define a word is a more
interesting problem than you might think), create a lookup table of Unicode
characters to words, build two strings made up of the Unicode characters
associated
with each word, Diff those two strings, then convert the diff back into the
text.
Sounds complicated, but it's not -- because the code has already been written
for
you. Just look at the diff_linesToChars and diff_charsToLines functions. Copy
them
and make them split on words instead of characters. Then your code will just
be:
Object b[] = diff_wordsToChars(text1, text2);
String wordText1 = (String) b[0];
String wordText2 = (String) b[1];
wordarray = (ArrayList<String>) b[2];
LinkedList<Diff> diffs = diff_main(wordText1, wordText2, false);
diff_charsToWords(diffs, wordarray);
Have fun defining what a "word" is. Been there, done that on another project.
:)
Original comment by neil.fra...@gmail.com
on 4 Jul 2009 at 2:24
Thanks for your input Neil. I will try that.
Original comment by pratapma...@yahoo.com
on 6 Jul 2009 at 5:24
I need this enhancement too:)
Original comment by chunhaic...@gmail.com
on 17 Apr 2010 at 12:31
I implemented the word-by-word (yes, is was easy), and it does a pretty good
job just
tokenizing spaces and newlines.
Something like:
wordEndSpace = text.indexOf(' ', wordStart);
wordEndNewline = text.indexOf('\n', wordStart);
wordEnd = Math.min(wordEndSpace, wordEndNewline);
That will do the trick effectively. You could of course do an nicer array
version, if
you have more matches (punctuation etc). Or perhaps regexp as well.
I guess the reason the simple version works well for me, is that the text is
preprocessed (from HTML) is a rather cool way, so whitespace in the text
matches HTML
rendering quite close.
Thanks for a great little piece of code, Niel.
Regards,
Mads Buus Westmark
Original comment by madsbuus...@gmail.com
on 20 Apr 2010 at 11:45
I need this feature also.
Original comment by g33.ad...@gmail.com
on 25 Jun 2011 at 11:26
[deleted comment]
[deleted comment]
hi,
diff_linesToChars functions having LinesToCharsResult as a return type.
Is there any changes required for diff_wordsToChars() ?
Object b[] = diff_wordsToChars(text1, text2);
String wordText1 = (String) b[0];
String wordText2 = (String) b[1];
wordarray = (ArrayList<String>) b[2];
LinkedList<Diff> diffs = diff_main(wordText1, wordText2, false);
diff_charsToWords(diffs, wordarray);
Using the diff-match-path class,we can get the character comparison not a word
comparison. what are all the changes required for the Word comparison?
Thanks for advance
Original comment by monigov...@gmail.com
on 17 Nov 2011 at 7:06
Original issue reported on code.google.com by
pratapma...@yahoo.com
on 4 Jul 2009 at 1:51