benjamine / jsondiffpatch

Diff & patch JavaScript objects
MIT License
4.86k stars 472 forks source link

Fix incorrect matrix initialization #291

Closed rexxars closed 1 year ago

rexxars commented 4 years ago

Unless I am missing something, the initialization of the matrix in the LCS filter is wrong. It currently puts a number (length of first array + 1) into the matrix, then immediately overwrites it.

While this shouldn't produce any bugs, I found it a little confusing when reading the code. As I see it there are two potential fixes:

  1. Start with an empty array
  2. Use the new Array(length) syntax

I assume the latter was the intention, so that's what the PR implements, but happy to change it if you want.

amiller-gh commented 4 years ago

You'll want to do the same on line 21 🙂

I've noticed some performance lag in this implementation of LCS with exceptionally long lists as well. In part because it creates so many objects, forcing a number of garbage collections throughout, and because of a number of repeat comparisons, nested loops, and duplicated logic.

Screen Shot 2020-07-06 at 7 35 58 PM

The entire matrix can be represented in a single array of

const matrix = new Array((len1+1) * (len2+1));

You can access any index (computed as needed) in the nested for loops by using:

const idx = (x * len1) + y;
const above = (x * len1) + y - 1;
const left = ((x - 1) * len1) + y ;

And most importantly, the new Array call does not need any instantiation. Instead, start the loops from 0 and use:

if (x === 0) { matrix[idx] = len1; }
else if (y === 0) { matrix[idx] = len2; )

before an other logic to fill default values in the same nested loop as the rest of the LCS logic.

Math.max can also be a hair slower than a simple ternary operation too and should be replaced.

Experimenting with the above changes, I was able to eke out an half second of performance gains from a ~3s) LCS comparison of two 10,000 item string arrays. Would be great to see it formalized and integrated!

rexxars commented 4 years ago

@amiller-gh Why don't you send a pull request for these changes? That sounds like a great improvement!