divec / ll

Two-way parallel translation
GNU Affero General Public License v3.0
3 stars 5 forks source link

Use more granular diffing and smarter rebasing, to reduce conflicts. #17

Closed divec closed 5 years ago

divec commented 5 years ago

To calculate text diffs we are just using ve.countEdgeMatches , i.e. the diff between x and y is the replacement of everything but the common start and end sequences. This can lead to replacing large chunks of unchanged content (e.g. if just the start and end word have changed, then the entire text is replaced).

Furthermore, we use rebasing that succeeds only if the replaced ranges do not intersect. This together with large replace ranges means we have conflicts quite often.

This very conservative approach to diffing and rebasing is appropriate in ve.dm.Change#rebaseTransactions, because it ensures tree balance is preserved. However we don't need that at all in LL, where we diff only inside ContentBranchNodes so everything is linear.

edg2s commented 5 years ago

We should be able to use ve.DiffMatchPatch from Thal's visual differ. It even uses UnicodeJS to do word breaking.

Example usage: https://github.com/wikimedia/VisualEditor/blob/master/src/dm/ve.dm.VisualDiff.js#L773