Closed GoogleCodeExporter closed 9 years ago
FYI,
Just being performance testing various implementations of text/binary delta
generation algorithms. Timed your Java implementation against GNU diff. On an
average
your Java implementation was 2-3X slower. It does not appear to be a file IO
issue.
Java file is is marginally slower but not 2-3X. I haven't looked inside the GNU
diff
implementation which is supposed to be based on the same algorithm.
Original comment by olivera...@gmail.com
on 25 Sep 2007 at 4:33
GNU diff is line-by-line. This implementation is character-by-character. So
if you
had more than two characters per line, then I think I just beat GNU diff.
Original comment by neil.fra...@gmail.com
on 25 Sep 2007 at 6:35
Hi Neil,
Great Work!!!!!!!!
I am analyzing your code to find out its suitability to use in our project.
When I am doing a diff calculation, I am getting java.lang.OutOfMemoryError:
Java
heap space error. Following stacktrace might help you to understand the issue.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:216)
at java.lang.StringBuilder.toString(StringBuilder.java:430)
at name.fraser.neil.plaintext.diff_match_patch.diff_map(diff_match_patch
.java:454)
at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p
atch.java:227)
at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc
h.java:140)
at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p
atch.java:269)
at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc
h.java:140)
Min heap size is 256M and max heap size is 1024M.
Can you please suggest any possible fix/solution to avoid this problem...
regards
Prasad Ganguri
Original comment by ganguri....@gmail.com
on 21 Apr 2008 at 8:29
What's the size of the text you are diffing?
Original comment by neil.fra...@gmail.com
on 21 Apr 2008 at 8:39
Neil,
Thanks for the immediate response.
new text: 155978 characters
old text: 49278 characters
Original comment by ganguri....@gmail.com
on 22 Apr 2008 at 12:35
[deleted comment]
Neil,
YOu know any project to that does "word" by "word" diff instead of char by
char.
like "test" as text 1 and "tall" as text 2 it should show that it is complete
delete
and add new word? Thanks.
Original comment by 07.tama...@gmail.com
on 6 Oct 2008 at 4:04
Thanks for this work!
Sorry, I don't have a suggested code change, but is this the expected behaviour?
If an inserted line starts with the same character as the next matching line,
the
first character is deemed to be a match, but the rest of the line and the first
character of the next line is not.
Attached is some code that reproduces the problem.
Original comment by n_old...@hotmail.com
on 16 Dec 2008 at 5:36
Attachments:
To answer my own question: Yes, expected behaviour of a character-match
algorithm (cf
line-by-line).
Original comment by n_old...@hotmail.com
on 16 Dec 2008 at 5:45
Yes, that's both a valid diff, a minimal diff and the expected behaviour.
However,
you are right that this is a semantically unusual behaviour. If you want a
diff
which makes sense to a human, run it through diff_cleanupSemantic. That will
shift
the diff sideways so that the end points line up with the line breaks, word
breaks or
other logical boundaries.
Original comment by neil.fra...@gmail.com
on 16 Dec 2008 at 6:04
Yep, thanks very much.
Good work!
Original comment by n_old...@hotmail.com
on 16 Dec 2008 at 6:26
Perhaps a very dumb question, but I'm stumped:
When I create a Python snippet after the Javascript patch demo, I get very
different
results. The patch looks very different and only the first element applies. Can
you
tell me what I'm doing wrong?
Obviously I've searched the web for answers, but no Python programs I've seen
seem to
use this pattern.
Thanks in advance.
Original comment by pieterj....@gmail.com
on 20 Apr 2009 at 6:47
Attachments:
Hello p.j.kers,
Found it, sort of. The issue is Python's treatment of whitespace. Your code's
line
breaks are entirely \n. If I execute your code, I get the same output as you
do.
But if I add a single blank line anywhere in your code, a \r\n line break is
added to
the source code (I'm on Windows). With that line in place, Python returns the
same
verbose output as JavaScript.
I'll take a closer look tomorrow to see what's going on. But it looks like
Python's
whitespace sensitivity, not the library.
Original comment by neil.fra...@gmail.com
on 20 Apr 2009 at 7:45
Hi Neil,
Thanks for your rapid response.
Noting the %0A elements in the patch text, I too suspected whitespace issues.
However, using U*X '\n' or DOS '\r\n' makes no difference on my platform
(Linux).
After your response I tried mixed lines too, but I cannot second your
observation -
the results stay unchanged.
After that, I've tried raw strings, unicode strings, loading from files with
different line codings, even with binary loading... All results are the same
when you
ignore the %0A to %0D%0A changes in some patch texts. For now I've run out of
smart
ideas.
FYI, I'm using Python 2.5.1 r251:54863 on Fedora 9, i386 (standard package).
Good luck tomorrow. I'm very curious about what this turns out to be.
Original comment by pieterj....@gmail.com
on 20 Apr 2009 at 8:44
> Good luck tomorrow. I'm very curious about what this turns out to be.
Got it. Whitespace was a complete red-herring. The timeout value in diff
match
patch is in seconds, but in Python it was being treated as ms. On my computer
it is
right on the edge, so whether the diff algorithm timed out or not was
influenced by
everything from the presence of a pyc file, to the kind of music iTunes was
playing.
That was fun to debug.
I've just uploaded a new version to SVN and to the download page which fixes
the
Python timeout. All other languages use milliseconds and do the conversion
properly.
Thanks!
I'm also going to close issue #3, since Google has fixed this issue tracker to
email
me on new issues.
Original comment by neil.fra...@gmail.com
on 20 Apr 2009 at 11:12
Congratulations, it works! It wouldn't even have crossed my mind to think about
timeout issues. Now I can use your code to build upon. En route you expanded my
grasp
of English idiom too - never heard of a red-herring before. Funny how politics
can
influence language.
Thanks a lot!
Original comment by pieterj....@gmail.com
on 21 Apr 2009 at 9:14
Neil: Do you still update this project? There are a few issues open in this
project and you haven't been "pinged" likely because this issue is closed and
is not immediately visible in the issue list.
Original comment by Cboisjol...@gmail.com
on 24 Aug 2010 at 5:28
Original issue reported on code.google.com by
neil.fra...@gmail.com
on 29 Jun 2007 at 1:32