Open laeubi opened 2 years ago
That would be an awesome enhancement, I moved the issue to team repo, as this is there compare is located.
@laeubi would you please test the proposed fix if it fits your request?
won't fix
@jukzi why won't fix?
well, no one seems to take the time to implement or even test.
I'm a bit short of time atm, but even if not two days are really not an indication of no interest, I think its still worth and your code should not be lost ...
The diff algorithm is based on finding longest common subsequences (LCS) between both files. By design, that will always make the common (unchanged) parts as large as possible. To make the result more human readable, the changes created by the LCS would have to be post-processed to create more balanced (instead of longest) common subsequences between the changes. A simple heuristic would be to modify all changes where there are identical lines at the end of the change and immediately before the change, and where the first line of a change has different indentation than the last line, and where that does not overlap with another change. I'll check if I can get that working...
The new diff algorithm introduced in #323 does not really solve the problem.
It just got lucky for that one scenario.
E.g. compare the following scenarios:
1) 6 lines added Before: After:
2) 5 lines added + additional whitespace line Before: After:
3) 5 lines added without additional whitespace line Before: After:
So it's really just very specific situations that benefit from the balancing algorithm. In (most) others, it doesn't bring the desired result.
I propose to revert #323 for now seeing as this has some side effects (#458) and also the algorithm is quite expensive for very large diffs with repeating lines.
The whole diff calculator probably needs to be re-designed somehow.
Seems like line based diff like LCS is unsuitable for tree structure documents. There are more advanced algorithms for xml. See for example: X-Diff: An Effective Change Detection Algorithm for XML
I recently noticed that text-compare seems to present a undesirable changeset if the following happen:
<dependency>
in a pom.xml or@Test
in a Unit test) and ending (e.g.</dependency>
in a pom.xml or}
in a Unit testWhat would be expected is that the whole block is marked as an insertion (here line 74 - 78)
Github for example displays this more nice