dominictarr / adiff

diff and patch operations on arrays.
Other
56 stars 9 forks source link

adiff - Array diff tools is javacript

adiff is a minimal implementation of diff tools, diff, patch, diff3 in javascript.

testling badge

build status

I initially started writing this to understand how git works. then i got totally carried away. adiff is a central component in snob a self hosting port of git to javascript.

how git works.

if you want to know what is the difference between two files, you must first know what is the same. this is called the Longest Common Subsequence problem. if you have two sequences x = "ABDCEF" andy = "ABCXYZF"thenLCS(x,y)` is clearly "ABCF".

lcs

function lcs (a,b)
  if head(a) == head(b)
    then lcs(a,b) = head(a) + lcs(tail(a), tail(b))
  else lcs(a, b) = max(lcs(tail(a),b), lcs(a, tail(b)))

(where max returns the longer list, head return the first element, and tail returns the rest of the sequence minus the head)

this is very simple, but with exponential time complexity. however, it can easily be made sufficiently performant by cacheing the return value of each call to lcs().

see js implementation, index.js#L64-94

chunking

now, we can see when the strings differ, by comparing them to the lcs. the next step is dividing them into 'stable' chunks where they match the lcs, and unstable chunks where they differ.

basically, to go from chunk("ABDCEF", "ABCXYZF") to ["AB", ["D", ""], "C", ["E", "XYZ"], "F"]

note that stable and unstable chunks always alternate.

basically, you iterate over the sequences and while the heads match the head of the lcs, shift that value to a stable chunk. then, while the heads do not match the next head of the lcs, collect add those items into an unstable chunk.

diff

once you have the chunks getting a list of changes that you can apply is easy...

making a diff from a to b we want to know what changes to make to a to get b. the way I have node this Array#splice so, for ["AB", ["D", ""], "C", ["E", "XYZ"], "F"]we want:

  var changes = [
    [4, 1, 'X', 'Y', 'Z'], //delete 1 item ("E") at index 4, then insert "X", "Y", "Z"
    [2, 1] //delete 1 item at index 2 ("D")
  ]

note, you can apply changes to the end of the array without altering the indexes in the start of the array.

this makes the function to apply the patch very simple

patch

  function patch (orig, changes) {
    var ary = orig.split('') //assuming that orig is just a string
    changes.forEach(function (ch) {
      [].splice.apply(ary, ch)
    })
    return ary.join('')
  }

diff3

if we want a distributed version management system, the we need to be able to make changes in parallel. this is only a slightly more complicated problem. given a string "ABDCEF", If I changed it to "ABCXYZF" and meanwhile you changed it to "AXBCEFG". we must compare each of our changes to the original string, the Concestor

merging rules

TODO: worked example with chunks, resolve.

license

MIT / Apache2