code-google-com / google-diff-match-patch

Automatically exported from code.google.com/p/google-diff-match-patch
Apache License 2.0
0 stars 0 forks source link

Semantic cleanup: "eliminate equalility" and "extract overlap" passes conflict #73

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Because both passes use non-strict inequality they end up doing double work in 
some cases.

Take for example, the following case:
...<del>abc</del>def<ins>ghi</ins>...

On the first "eliminate equality" pass the equality will be merged into edits 
because quality length (3) is <= than edits length on both sides (3).
So it will become:
...<del>abcdef</del><ins>defghi</ins>...

Then on the second "extract overlap" pass the two above edits overlap will be 
extracted back because overlap length (3) >= edit length (6) / 2.
So everything will be reverted to:
...<del>abc</del>def<ins>ghi</ins>...

It seems, the quick and easy fix for this is to make the second pass comparison 
strict - use > instead of >=
Here:
if (overlap_length1 > deletion.length() / 2 || overlap_length1 > 
insertion.length() / 2)
and here:
if (overlap_length2 > deletion.length() / 2 || overlap_length2 > 
insertion.length() / 2)

Original issue reported on code.google.com by 2sa...@gmail.com on 20 Jun 2012 at 7:25