Important: Ping me if you file a bug.

GoogleCodeExporter commented 9 years ago

Google Code does not currently send any notification to me when a new issue
is added to this list.  Since there are virtually no issues with my code
(gloat), I don't visit this page often.

So if you don't want to be ignored, send me an email to let me know that
you've filed a new bug:
  http://neil.fraser.name

Thanks!

Original issue reported on code.google.com by neil.fra...@gmail.com on 29 Jun 2007 at 1:32

GoogleCodeExporter commented 9 years ago

FYI,

Just being performance testing various implementations of text/binary delta
generation algorithms. Timed your Java implementation against GNU diff. On an 
average
your Java implementation was 2-3X slower. It does not appear to be a file IO 
issue.
Java file is is marginally slower but not 2-3X. I haven't looked inside the GNU 
diff
implementation which is supposed to be based on the same algorithm.

Original comment by olivera...@gmail.com on 25 Sep 2007 at 4:33

GoogleCodeExporter commented 9 years ago

GNU diff is line-by-line.  This implementation is character-by-character.  So 
if you
had more than two characters per line, then I think I just beat GNU diff.

Original comment by neil.fra...@gmail.com on 25 Sep 2007 at 6:35

GoogleCodeExporter commented 9 years ago

Hi Neil,

Great Work!!!!!!!!

I am analyzing your code to find out its suitability to use in our project.

When I am doing a diff calculation, I am getting java.lang.OutOfMemoryError: 
Java
heap space error. Following stacktrace might help you to understand the issue.

java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.<init>(String.java:216)
        at java.lang.StringBuilder.toString(StringBuilder.java:430)
        at name.fraser.neil.plaintext.diff_match_patch.diff_map(diff_match_patch    
                                                                        .java:454)
        at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p    
                                                                        atch.java:227)
        at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc    
                                                                        h.java:140)
        at name.fraser.neil.plaintext.diff_match_patch.diff_compute(diff_match_p    
                                                                        atch.java:269)
        at name.fraser.neil.plaintext.diff_match_patch.diff_main(diff_match_patc    
                                                                        h.java:140)

Min heap size is 256M and max heap size is 1024M.

Can you please suggest any possible fix/solution to avoid this problem...

regards
Prasad Ganguri

Original comment by ganguri....@gmail.com on 21 Apr 2008 at 8:29

GoogleCodeExporter commented 9 years ago

What's the size of the text you are diffing?

Original comment by neil.fra...@gmail.com on 21 Apr 2008 at 8:39

GoogleCodeExporter commented 9 years ago

Neil, 

Thanks for the immediate response.

new text: 155978 characters
old text: 49278 characters

Original comment by ganguri....@gmail.com on 22 Apr 2008 at 12:35

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Neil,

YOu know any project to that does "word" by "word" diff instead of char by 
char. 
like "test" as text 1 and "tall" as text 2 it should show that it is complete 
delete 
and add new word? Thanks.

Original comment by 07.tama...@gmail.com on 6 Oct 2008 at 4:04

GoogleCodeExporter commented 9 years ago

Thanks for this work!

Sorry, I don't have a suggested code change, but is this the expected behaviour?

If an inserted line starts with the same character as the next matching line, 
the
first character is deemed to be a match, but the rest of the line and the first
character of the next line is not.

Attached is some code that reproduces the problem.

Original comment by n_old...@hotmail.com on 16 Dec 2008 at 5:36

Attachments:

DiffTest.java

GoogleCodeExporter commented 9 years ago

To answer my own question: Yes, expected behaviour of a character-match 
algorithm (cf
line-by-line).

Original comment by n_old...@hotmail.com on 16 Dec 2008 at 5:45

GoogleCodeExporter commented 9 years ago

Yes, that's both a valid diff, a minimal diff and the expected behaviour.  
However, 
you are right that this is a semantically unusual behaviour.  If you want a 
diff 
which makes sense to a human, run it through diff_cleanupSemantic.  That will 
shift 
the diff sideways so that the end points line up with the line breaks, word 
breaks or 
other logical boundaries.

Original comment by neil.fra...@gmail.com on 16 Dec 2008 at 6:04

GoogleCodeExporter commented 9 years ago

Yep, thanks very much.

Good work!

Original comment by n_old...@hotmail.com on 16 Dec 2008 at 6:26

GoogleCodeExporter commented 9 years ago

Perhaps a very dumb question, but I'm stumped:

When I create a Python snippet after the Javascript patch demo, I get very 
different
results. The patch looks very different and only the first element applies. Can 
you
tell me what I'm doing wrong?

Obviously I've searched the web for answers, but no Python programs I've seen 
seem to
use this pattern.

Thanks in advance.

Original comment by pieterj....@gmail.com on 20 Apr 2009 at 6:47

Attachments:

GoogleCodeExporter commented 9 years ago

Hello p.j.kers,

Found it, sort of.  The issue is Python's treatment of whitespace.  Your code's 
line 
breaks are entirely \n.  If I execute your code, I get the same output as you 
do.
But if I add a single blank line anywhere in your code, a \r\n line break is 
added to 
the source code (I'm on Windows).  With that line in place, Python returns the 
same 
verbose output as JavaScript.

I'll take a closer look tomorrow to see what's going on.  But it looks like 
Python's 
whitespace sensitivity, not the library.

Original comment by neil.fra...@gmail.com on 20 Apr 2009 at 7:45

GoogleCodeExporter commented 9 years ago

Hi Neil,

Thanks for your rapid response.

Noting the %0A elements in the patch text, I too suspected whitespace issues.
However, using U*X '\n' or DOS '\r\n' makes no difference on my platform 
(Linux).
After your response I tried mixed lines too, but I cannot second your 
observation -
the results stay unchanged.

After that, I've tried raw strings, unicode strings, loading from files with
different line codings, even with binary loading... All results are the same 
when you
ignore the %0A to %0D%0A changes in some patch texts. For now I've run out of 
smart
ideas.

FYI, I'm using Python 2.5.1 r251:54863 on Fedora 9, i386 (standard package).

Good luck tomorrow. I'm very curious about what this turns out to be.

Original comment by pieterj....@gmail.com on 20 Apr 2009 at 8:44

GoogleCodeExporter commented 9 years ago

> Good luck tomorrow. I'm very curious about what this turns out to be.

Got it.  Whitespace was a complete red-herring.  The timeout value in diff 
match 
patch is in seconds, but in Python it was being treated as ms.  On my computer 
it is 
right on the edge, so whether the diff algorithm timed out or not was 
influenced by 
everything from the presence of a pyc file, to the kind of music iTunes was 
playing.  
That was fun to debug.

I've just uploaded a new version to SVN and to the download page which fixes 
the 
Python timeout.  All other languages use milliseconds and do the conversion 
properly.  
Thanks!

I'm also going to close issue #3, since Google has fixed this issue tracker to 
email 
me on new issues.

Original comment by neil.fra...@gmail.com on 20 Apr 2009 at 11:12

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Congratulations, it works! It wouldn't even have crossed my mind to think about
timeout issues. Now I can use your code to build upon. En route you expanded my 
grasp
of English idiom too - never heard of a red-herring before. Funny how politics 
can
influence language.

Thanks a lot!

Original comment by pieterj....@gmail.com on 21 Apr 2009 at 9:14

GoogleCodeExporter commented 9 years ago

Neil: Do you still update this project? There are a few issues open in this 
project and you haven't been "pinged" likely because this issue is closed and 
is not immediately visible in the issue list.

Original comment by Cboisjol...@gmail.com on 24 Aug 2010 at 5:28

curran / google-diff-match-patch

Important: Ping me if you file a bug. #3