benjamine / jsondiffpatch

Diff & patch JavaScript objects
MIT License
4.77k stars 464 forks source link

[Perf] Experiencing slow diffs on large arrays #365

Closed soumilbaldota closed 4 months ago

soumilbaldota commented 4 months ago

Running it on chrome browser in the frontend Chrome version : Latest Ram: 16 GB Processor: Ryzen 5 5000

I am trying to diff two string arrays. These arrays are 20k in length. It is taking ~5s to find the diff

Are there any options that I can modify to make the diff quicker?

Please find the minimal reproducible sample here: https://github.com/soumilbaldota/jsondiffpatch-reprod

samirotiv commented 4 months ago

Thanks for the great work, @benjamine ! And thanks for posting this issue, @soumilbaldota Posting here for users who need to solve this:

I made a small change - I ran the example in a loop for 10 iterations to bring out the problem more. I then inspected it using the flamegraph profiler in Chrome.

image

This is what the left-heavy view of speedscope showed. There is one function called arraysHaveMatchByRef which is extremely expensive.

Going into the implementation of this function, it seems like an extremely expensive O(n^2) function only to check whether arrays have atleast one matching element which has changed index.

image

Looking further, this is called by one place called DiffFilter which decides whether to perform a match by position.

image

Refer to this link for more info on matching on position - https://github.com/benjamine/jsondiffpatch/blob/master/docs/arrays.md

To get rid of this, there are two ways:

To perform the diff: result = differ.diff(idArr, updatedData);

I was not able to find the matchByPosition option documented anywhere in the README.md - if this is documented, it will help users avoid this O(n^2) trap.

samirotiv commented 4 months ago

Looks like this was added to fix #157

samirotiv commented 4 months ago

@benjamine I would suggest as an improvement for a better out-of-the-box experience, we can enable matchByPosition if any of the two arrays contain atleast one non-primitive type (i.e. object). If all elements of the two arrays are strings, numbers, booleans or nulls, then matchByPosition can be false. This can be checked in O(n) time.