Closed brandonros closed 8 years ago
Whoa, crazy. I assume you're using the current version of rfc6902
?
I have very little to go on without seeing the data you're pushing through, but my best guess is at some point you have two objects that look like a pair of arrays (Array.isArray(x)
returns true), one or both of which have a .length
property that returns a negative / undefined / string value. And so it goes running way off above the Levenshtein memoization table, out of bounds and out of control.
Can you figure out precisely what pair of objects triggers this error, and paste them here?
I do have an object with dimensions (length, width, height).
var rfc6902 = require('rfc6902');
var o = [{
length: 1
}];
var n = [{
length: 2
}];
rfc6902.createPatch(o, n);
This didn't bomb out though.
Those objects are probably what's breaking it. The width / height properties are unimportant.
What kinds of values does length have? Is it ever undefined / null / not a number?
And apparently that object with the weird length property is responding true
when you call Array.isArray
on it, which is unfortunate. You will probably want to figure out why that is the case, and fix it:
n = [{ length: 2 }]
[Object]
Array.isArray(n)
true
Array.isArray(n[0])
false
Array.isArray(n[0]['length'])
false
var t = n[0];
Array.isArray(t);
false
Straight out of Chrome's console.
Yes, that's all what I would expect, but I take it your production data on your production servers is different, which is why these don't bomb out, and your production data does.
For example, I'm not sure if anybody still uses mongoose
, but it fucks with native objects in all sorts of weird ways. I would not be surprised if running the results of a mongoose query through a reflective mechanism like rfc6902
blows the stack.
I use Mongoose. -_-
Don't remind me.
Give me a second. I'll query from Mongoose into this thing.
I thought the lean call turns all of their bullshit off...
Well there you go. You're gonna need to call toObject()
or toPlainObject()
or whatever it's called on anything you pass / use somewhere else. You wanna put it in a template? toObject() first or all kinds of weird shit's gonna happen. You want to diff two of them? x.toObject(), y.toObject().
I've been there, and mongoose isn't worth it. (MongoDB isn't worth it, either, but one step at a time.)
I tried to move over to Postgres. Trust me.
require('../globals.js');
var rfc6902 = require('rfc6902');
var mongo = require('../lib/mongo.js');
mongo.init()
.then(function () {
var query = {
'_id': '56ef606d366cddef56d08a1a'
};
return mongo.find('box', query);
})
.then(function (o) {
var n = [{
length: 0
}];
var patch = rfc6902.createPatch(o, n);
console.log(patch);
rfc6902.applyPatch(o, patch);
console.log(o);
})
.catch(function (err) {
console.log(err);
});
[ { op: 'remove', path: '/0/_id' },
{ op: 'remove', path: '/0/account_id' },
{ op: 'remove', path: '/0/nickname' },
{ op: 'remove', path: '/0/location' },
{ op: 'remove', path: '/0/width' },
{ op: 'remove', path: '/0/height' },
{ op: 'remove', path: '/0/weight_major' },
{ op: 'remove', path: '/0/weight_minor' },
{ op: 'remove', path: '/0/__v' },
{ op: 'replace', path: '/0/length', value: 0 } ]
[ { length: 0 } ]
Doesn't cause it. -_-
o = [{
"_id": "56ef606d366cddef56d08a1a",
"nickname": "BX1",
"location": "A",
"length": 24,
"width": 18,
"height": 9,
"weight_major": 0,
"weight_minor": 0
}]
According to the documentation, the .lean() I already call keeps it a plain Object, so that isn't it.
I found it.
mongo.init()
.then(function () {
var query = {
'_id': '56ef606d366cddef56d08a1a'
};
return mongo.find('box', query);
})
.then(function (o) {
var n = [o,
{
length: 0
}];
var patch = rfc6902.createPatch(o, n);
console.log(patch);
rfc6902.applyPatch(o, patch);
console.log(o);
})
.catch(function (err) {
console.log(err);
});
Outputs
[ { op: 'replace', path: '/0', value: [ [Object] ] },
{ op: 'add', path: '/-', value: { length: 0 } } ]
[ [Circular], { length: 0 } ]
Then goes crazy.
Wait, I'm wrong?
Yeah, that's not it. Sorry.
I found it. It has to do to when both the new and old objects are equal. I will paste now.
Unfortunately, o = JSON.parse(JSON.stringify(o));
fixes it, so it is 100% Mongoose -_-
Ah, it is probably comparing the ObjectId() returned...
I did
function compare(left, right) {
console.log(left, right);
// strict equality handles literals, numbers, and strings (a sufficient but not necessary cause)
and right before the stack, it is doing
567a0636ee17c3060c1f0525 '567a09d9ee17c3060c1f1991'
(you can tell that is the ObjectId because it doesn't print wrapped in single quotes, kill me)
and then
undefined { _id: '567a09d9ee17c3060c1f1991',
account_id: '56733bc7ed37353343145af9',
name: 'yazaki terminal',
sku: '7115-1614',
upc: '7115-1614',
location: 'N/A',
listings: [],
__v: 0,
fulfillment_sku: null,
notes: null,
unit: null,
cost: null,
wholesale: null,
retail: null,
active: true,
tags: [] }
until it exceeds the maximum call stack size.
build-16 (out): <--- Last few GCs --->
build-16 (out):
build-16 (out): 363560 ms: Scavenge 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 0.8 / 0 ms [allocation failure].
build-16 (out): 363561 ms: Scavenge 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 0.8 / 0 ms [allocation failure].
build-16 (out): 363562 ms: Scavenge 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 0.7 / 0 ms [allocation failure].
build-16 (out): 369598 ms: Mark-sweep 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 6036.6 / 0 ms [last resort gc].
build-16 (out): 375504 ms: Mark-sweep 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 5905.4 / 0 ms [last resort gc].
build-16 (out):
build-16 (out):
build-16 (out): <--- JS stacktrace --->
build-16 (out):
build-16 (out): ==== JS stack trace =========================================
build-16 (out):
build-16 (out): Security context: 0xfec271b4629 <JS Object>
build-16 (out): 1: ArrayConcatJS [native array.js:~398] [pc=0x234c17d54aa9] (this=0x1ed6d0d70e21 <JS Array[791]>,ah=0x3c30e3097e61 <an Object with map 0x36954af97409>)
build-16 (out): 5: dist(aka dist) [/home/f/node_modules/rfc6902/rfc6902.js:~106] [pc=0x234c17c2a7dc] (this=0xfec271041b9 <undefined>,i=281,j=797)
build-16 (out): 6: dist(aka dist) [/home/f/node_modules/rfc6902/rfc6902.js:~106] [pc=0x234c17c2a88a] (this=0...
build-16 (out):
Even with the stringify/parse code to convert everything to plain objects.
I'm not sure what I'm supposed to gather from the GC stacktrace.
Regarding the comment before that, it seems lean() or whatever is lying to you, as including an ObjectId instance isn't what I would consider a plain old javascript object (aka. 'POJO').
If you really want to see what's going wrong, stick a console.log(input, output, i, j);
right after the function dist(i, j) {
line.
I can't fix mongoose for you, but I'll look into adding a secondary check so that even if an object purports to be an Array, but has an invalid length, it won't overflow the stack.
The error still happens even after this:
var patch = rfc6902.createPatch(JSON.parse(JSON.stringify(o)), JSON.parse(JSON.stringify(n)));
I am working on getting you test data now.
Excellent. If you can get createPatch to break on JSON.parse output, I'll be glad to fix it.
Do you know if this behaves well with lots of objects? Like, 10k objects, each having arrays and objects within them? I think that might be the issue.
diff()
tries to be smart with arrays, so if you have really long arrays, that might be a concern, but that shouldn't cause the errors you're running into, which I'm inclined to blame entirely on mongoose.
If you're worried about load, try it out with 10k of your objects, and see if it runs fast enough for you. If not, you can keep looking or do some profiling. Up to you.
I can't get any meaningful output out of the console.log
line you told me due to the sheer number of objects involved here.
Ah, okay, then cut it down to just console.log(i, j)
. If my original guess at the underlying problem is correct, you'll see some NaN or null or negative numbers right before it stack overflows.
The old and new elements are equal. There are 1,006 elements.
A test element looks like this:
{
"store_id": "57057eb7c9f0dc2a53cf8896",
"order_number": "3730",
"status": "unstarted",
"stash": {
"_id": "5722d707287c498021052d46",
"store_id": "57057eb7c9f0dc2a53cf8896",
"shipping_information": {
"address_type": null,
"method": "None",
"address_2": "xxx",
"address": "zzz"
},
"shipping": 0,
"tax": 21.93,
"discount": 0,
"total": 460.5,
"order_id": "56721",
"id": "3730",
"items": [
{
"dropshipped": false,
"quantity": 25,
"price": 417.75,
"_id": "5722d707287c498021052d48"
},
{
"dropshipped": false,
"quantity": 1,
"price": 20.82,
"_id": "5722d707287c498021052d47"
}
]
},
"batch_number": null,
"shipping_memory_hash": null,
"combined_order_numbers": [],
"tags": [],
"notes": [],
"shipments": [],
"log": []
}
i is creeping up to over 250
Is this normal?
For what it is worth, the package jsondiffpatch currently handles this data fine (and in under 100ms)
I'm moving off of it because my users experience intermittent issues that I assume have to be related to the patch protocol.
{"i":213,"j":884}
{"i":214,"j":883}
{"i":213,"j":883}
{"i":214,"j":882}
{"i":213,"j":882}
{"i":214,"j":881}
{"i":213,"j":881}
<--- Last few GCs --->
176852 ms: Scavenge 1083.3 (1457.6) -> 1083.3 (1457.6) MB, 0.6 / 0 ms [allocation failure].
176853 ms: Scavenge 1083.3 (1457.6) -> 1083.3 (1457.6) MB, 0.6 / 0 ms [allocation failure].
176853 ms: Scavenge 1083.3 (1457.6) -> 1083.3 (1457.6) MB, 0.6 / 0 ms [allocation failure].
181665 ms: Mark-sweep 1083.3 (1457.6) -> 1083.1 (1457.6) MB, 4811.7 / 0 ms [last resort gc].
186550 ms: Mark-sweep 1083.1 (1457.6) -> 1083.3 (1457.6) MB, 4884.7 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x1ade35eb4629 <JS Object>
1: ArrayConcatJS [native array.js:~398] [pc=0xfb736320849] (this=0x336f0becdd11 <JS Array[881]>,ah=0x36a4bacdb6b1 <an Object with map 0x18d7a4c50029>)
5: dist(aka dist) [/home/skulabs/node_modules/rfc6902/rfc6902.js:~106] [pc=0xfb73632567d] (this=0x1ade35e041b9 <undefined>,i=214,j=881)
6: dist(aka dist) [/home/skulabs/node_modules/rfc6902/rfc6902.js:~106] [pc=0xfb73632574f] (this=0x...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)
Just published v1.2.0, which shouldn't stack overflow even when you give it weird mongoose objects, but since I never got a replicable test case from you, I haven't been able to test that.
I also generated an array of pretty complex 1K objects via http://www.json-generator.com/, and mucked around with making a few changes to a copy of the array, trying to get createPatch()
to break, but couldn't.
If you can come up with a breaking test case that has the following form, I'll fix it:
var input = <some object literal>;
var output = <another object literal>;
createPatch(input, output);
Otherwise I'm going to consider this issue closed / fixed, because I can't infer what's going wrong from your stack traces.
How long did it take to generate the patch for the large test, approximately?
It didn't trigger mocha's "slow test" warning, so, < 75ms.
Could it have been that my input and output were equal? Could it have anything to do with a common key like _id and their sorted order?
I will try to provide more information.
I tried that (along with other mutations). Using 1k_array.json, the test below runs and passes in < 75ms.
describe('issues/10', () => {
var input = JSON.parse(readFileSync(`${__dirname}/1k_array.json`));
var output = JSON.parse(JSON.stringify(input));
var expected_patch = [];
var actual_patch = createPatch(input, output);
it('should produce patch equal to expectation', () => {
assert.deepEqual(actual_patch, expected_patch);
});
});
I'm able to get it to recreate but the stringified JSON is 24mb containing sensitive data.
function dist(i, j) {
console.log(i, j);
29698 29698
29697 29697
RangeError: Maximum call stack size exceeded
at Object.stylizeNoColor [as stylize] (util.js:158:24)
at formatPrimitive (util.js:439:16)
at formatValue (util.js:227:19)
at inspect (util.js:109:10)
at exports.format (util.js:17:20)
at Console.log (console.js:39:34)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:107:17)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
Where would you like me to add debugging information?
If I turn off that console.log, I get this
RangeError: Maximum call stack size exceeded
at /Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:283:45
at Array.every (native)
at compareArrays (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:283:30)
at compareObjects (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:293:10)
at compare (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:309:16)
at /Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:284:16
at Array.every (native)
at compareArrays (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:283:30)
at compare (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:305:16)
at /Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:296:16
at Array.every (native)
at compareObjects (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:295:23)
at compare (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:309:16)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:110:17)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
Any reason you roll if (Object(left) === left && Object(right) === right) {
instead of if (typeof left === 'object' && typeof right === 'object') {
?
The test data is an array of objects with a length of 38,350. I wish I didn't have data this disgusting, trust me.
If I slice it to the first 8,500, it takes 288ms but does not overflow (which is faster than jsondiffpatch).
At around a length of around 9,000 for input and output, the overflows start (plain JSON.parsed objects).
The expected patch should be [].
Alright, this really isn't an error with this library at all.
I ran node --stack-size=65500 scratchpads/diff_test.js
and it blew every other diffing library out of the water.
Nasty as hell. Is there anything that can be done? Would any abuse of Promises combined with setTimeout(..., 0) or setImmediate or process.nextTick aid?
You'll get stack overflows at slightly different places that don't seem to correspond with what you added or removed because the call site that finally overflows the stack is just the feather that broke the camel's back. The interesting bit, in this case, is the cycle that terminates somewhere just below in the stack trace, i.e.,
...
at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:107:17)
Object(x) === x
is a better test for non-atomic types than typeof x === 'object'
. I can't recall where, but I don't think it matters unless you're worried about someone redefining the Object
global. Either way, I don't think this is an issue.diffArrays
uses a cost / patch accumulation function that calls itself recursively, namely, the dist
that features in your stack traces. Even in the best case, where all your objects are identical, it is going to build a stack that is yourArray.length
tall.
Math.max(yourArray.length, yourOtherArray.length)
if they're of different lengths) in all cases—but if there are changes, it will branch out into multiple stacks that are that tall.A recursive implementation is, at least to my sensibilities, the most elegant in terms of solving the edit distance problem in sub-O(n^2) time and memory, in the non-pathological case (the pathological / worst cases will require O(n^2) time and memory). If you Google around, you'll likely find some Levenshtein algorithms that do it with a couple nested for
loops, but keep in mind that in this use case, I'm not just calculating the edit distance, but accumulating the list of operations that carry out those edits.
--stack-size
option, the stack limit is specified in total kB, not the number of frames. I could probably thin down the size of the frames accumulated by the dist()
function, but that would only linearly increase the maximum size of the arrays it supports, and you'd still be in trouble if your stack were already pretty heavy by the time you get to the createPatch()
call.I have an idea for a way to transform the recursive dist()
algorithm into a plain Array based stack that queues up edit cost lookups, but it will take O(2n) time in the best case, and I may decide not to replace the current recursive dist()
implementation if my loop idea 1) takes unreasonably longer for small arrays, or 2) turns out to be too messy / inelegant / unmaintainable.
So if you want a quick and sure solution, I'd recommend defining your own diffArrays
function, then importing diffObjects
and diffValues
from require('rfc6902/diff')
(now exported as of v1.2.1
), and then, using require('rfc6902/diff').diffAny
as a template, wrapping those into your own custom createPatch(...)
. I don't understand why you're diffing arrays that are apparently always equal anyway, but if you envision, someday, diffing inequal arrays, finding the sequence of operations corresponding to a/the minimum edit distance may not be what you want out of it. I.e., a different algorithm might suit your needs better. Writing your own diffArrays
would allow you to customize that.
Thanks for the response.
The arrays will not be equal. They can have elements removed, added, or any field (which can also be an array of objects, yada yada) changed. It's the client side database for a SPA that I patch on any changes.
I've gotten this to pass my local tests, but as soon as I put it under any real load, I have to roll back to jsondiffpatch.
Please follow up with any progress you make. I am happy to help in any way possible.
jsondiffpatch failed me and now this...