When diffs report deleted or equivalent characters they report a span length as a count of characters; unfortunately there is no common definition of character.
diff-match-patch could resolve this in a backwards-compatible way by adding a preamble to its patches that indicates which definition is in use, through the use of semantically empty diff groups.
For example, a leading group of zero length or an empty insert operation should have no impact on the diffed files, so may be used to communicate very small amounts of information.
Consider:
EQUAL(0), EQUAL(0), ...rest of diff indices/counts represent UTF-16 code units.
EQUAL(0), ...rest of diff indices/counts represent Unicode code points.
...rest of diff indicates that indices/counts represent whatever they did before in their respetive libraries.
A new parameter to the diffing functions can set a mode so that clients can request specific counts. For example, diff_main(a, b, {units: 'unicode'})
When diffs report deleted or equivalent characters they report a span length as a count of characters; unfortunately there is no common definition of character.
diff-match-patch
could resolve this in a backwards-compatible way by adding a preamble to its patches that indicates which definition is in use, through the use of semantically empty diff groups.For example, a leading group of zero length or an empty insert operation should have no impact on the diffed files, so may be used to communicate very small amounts of information.
Consider:
EQUAL(0), EQUAL(0), ...rest of diff
indices/counts represent UTF-16 code units.EQUAL(0), ...rest of diff
indices/counts represent Unicode code points....rest of diff
indicates that indices/counts represent whatever they did before in their respetive libraries.A new parameter to the diffing functions can set a mode so that clients can request specific counts. For example,
diff_main(a, b, {units: 'unicode'})