chbrown / rfc6902

Complete implementation of RFC6902 in TypeScript
https://chbrown.github.io/rfc6902/
322 stars 39 forks source link

Some kind of crazy recursion going on in my production servers -_- #10

Closed brandonros closed 8 years ago

brandonros commented 8 years ago

jsondiffpatch failed me and now this...

build-16 (err): RangeError: Maximum call stack size exceeded
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:110:30)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
build-16 (err):     at dist (/home/foofoo/node_modules/rfc6902/rfc6902.js:127:43)
chbrown commented 8 years ago

Whoa, crazy. I assume you're using the current version of rfc6902?

I have very little to go on without seeing the data you're pushing through, but my best guess is at some point you have two objects that look like a pair of arrays (Array.isArray(x) returns true), one or both of which have a .length property that returns a negative / undefined / string value. And so it goes running way off above the Levenshtein memoization table, out of bounds and out of control.

Can you figure out precisely what pair of objects triggers this error, and paste them here?

brandonros commented 8 years ago

I do have an object with dimensions (length, width, height).

brandonros commented 8 years ago
var rfc6902 = require('rfc6902');

var o = [{
    length: 1
}];

var n = [{
    length: 2
}];

rfc6902.createPatch(o, n);

This didn't bomb out though.

chbrown commented 8 years ago

Those objects are probably what's breaking it. The width / height properties are unimportant.

What kinds of values does length have? Is it ever undefined / null / not a number?

And apparently that object with the weird length property is responding true when you call Array.isArray on it, which is unfortunate. You will probably want to figure out why that is the case, and fix it:

brandonros commented 8 years ago
n = [{ length: 2 }]
[Object]
Array.isArray(n)
true
Array.isArray(n[0])
false
Array.isArray(n[0]['length'])
false
brandonros commented 8 years ago
var t = n[0];
Array.isArray(t);
false

Straight out of Chrome's console.

chbrown commented 8 years ago

Yes, that's all what I would expect, but I take it your production data on your production servers is different, which is why these don't bomb out, and your production data does.

For example, I'm not sure if anybody still uses mongoose, but it fucks with native objects in all sorts of weird ways. I would not be surprised if running the results of a mongoose query through a reflective mechanism like rfc6902 blows the stack.

brandonros commented 8 years ago

I use Mongoose. -_-

brandonros commented 8 years ago

Don't remind me.

brandonros commented 8 years ago

Give me a second. I'll query from Mongoose into this thing.

I thought the lean call turns all of their bullshit off...

chbrown commented 8 years ago

Well there you go. You're gonna need to call toObject() or toPlainObject() or whatever it's called on anything you pass / use somewhere else. You wanna put it in a template? toObject() first or all kinds of weird shit's gonna happen. You want to diff two of them? x.toObject(), y.toObject().

I've been there, and mongoose isn't worth it. (MongoDB isn't worth it, either, but one step at a time.)

brandonros commented 8 years ago

I tried to move over to Postgres. Trust me.

brandonros commented 8 years ago
require('../globals.js');

var rfc6902 = require('rfc6902');

var mongo = require('../lib/mongo.js');

mongo.init()
.then(function () {
    var query = {
        '_id': '56ef606d366cddef56d08a1a'
    };

    return mongo.find('box', query);
})
.then(function (o) {
    var n = [{
        length: 0
    }];

    var patch = rfc6902.createPatch(o, n);

    console.log(patch);

    rfc6902.applyPatch(o, patch);

    console.log(o);
})
.catch(function (err) {
    console.log(err);
});
[ { op: 'remove', path: '/0/_id' },
  { op: 'remove', path: '/0/account_id' },
  { op: 'remove', path: '/0/nickname' },
  { op: 'remove', path: '/0/location' },
  { op: 'remove', path: '/0/width' },
  { op: 'remove', path: '/0/height' },
  { op: 'remove', path: '/0/weight_major' },
  { op: 'remove', path: '/0/weight_minor' },
  { op: 'remove', path: '/0/__v' },
  { op: 'replace', path: '/0/length', value: 0 } ]
[ { length: 0 } ]

Doesn't cause it. -_-

brandonros commented 8 years ago
o = [{
    "_id": "56ef606d366cddef56d08a1a",
    "nickname": "BX1",
    "location": "A",
    "length": 24,
    "width": 18,
    "height": 9,
    "weight_major": 0,
    "weight_minor": 0
  }]
brandonros commented 8 years ago

According to the documentation, the .lean() I already call keeps it a plain Object, so that isn't it.

http://mongoosejs.com/docs/api.html#query_Query-lean

brandonros commented 8 years ago

I found it.

mongo.init()
.then(function () {
    var query = {
        '_id': '56ef606d366cddef56d08a1a'
    };

    return mongo.find('box', query);
})
.then(function (o) {
    var n = [o,
    {
        length: 0
    }];

    var patch = rfc6902.createPatch(o, n);

    console.log(patch);

    rfc6902.applyPatch(o, patch);

    console.log(o);
})
.catch(function (err) {
    console.log(err);
});

Outputs

[ { op: 'replace', path: '/0', value: [ [Object] ] },
  { op: 'add', path: '/-', value: { length: 0 } } ]
[ [Circular], { length: 0 } ]

Then goes crazy.

brandonros commented 8 years ago

Wait, I'm wrong?

brandonros commented 8 years ago

Yeah, that's not it. Sorry.

brandonros commented 8 years ago

I found it. It has to do to when both the new and old objects are equal. I will paste now.

brandonros commented 8 years ago

Unfortunately, o = JSON.parse(JSON.stringify(o)); fixes it, so it is 100% Mongoose -_-

brandonros commented 8 years ago

Ah, it is probably comparing the ObjectId() returned...

brandonros commented 8 years ago

I did

function compare(left, right) {
    console.log(left, right);

    // strict equality handles literals, numbers, and strings (a sufficient but not necessary cause)

and right before the stack, it is doing

567a0636ee17c3060c1f0525 '567a09d9ee17c3060c1f1991'

(you can tell that is the ObjectId because it doesn't print wrapped in single quotes, kill me)

and then

undefined { _id: '567a09d9ee17c3060c1f1991',
  account_id: '56733bc7ed37353343145af9',
  name: 'yazaki terminal',
  sku: '7115-1614',
  upc: '7115-1614',
  location: 'N/A',
  listings: [],
  __v: 0,
  fulfillment_sku: null,
  notes: null,
  unit: null,
  cost: null,
  wholesale: null,
  retail: null,
  active: true,
  tags: [] }

until it exceeds the maximum call stack size.

brandonros commented 8 years ago
build-16 (out): <--- Last few GCs --->
build-16 (out):
build-16 (out):   363560 ms: Scavenge 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 0.8 / 0 ms [allocation failure].
build-16 (out):   363561 ms: Scavenge 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 0.8 / 0 ms [allocation failure].
build-16 (out):   363562 ms: Scavenge 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 0.7 / 0 ms [allocation failure].
build-16 (out):   369598 ms: Mark-sweep 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 6036.6 / 0 ms [last resort gc].
build-16 (out):   375504 ms: Mark-sweep 1328.7 (1457.1) -> 1328.7 (1457.1) MB, 5905.4 / 0 ms [last resort gc].
build-16 (out):
build-16 (out):
build-16 (out): <--- JS stacktrace --->
build-16 (out):
build-16 (out): ==== JS stack trace =========================================
build-16 (out):
build-16 (out): Security context: 0xfec271b4629 <JS Object>
build-16 (out):     1: ArrayConcatJS [native array.js:~398] [pc=0x234c17d54aa9] (this=0x1ed6d0d70e21 <JS Array[791]>,ah=0x3c30e3097e61 <an Object with map 0x36954af97409>)
build-16 (out):     5: dist(aka dist) [/home/f/node_modules/rfc6902/rfc6902.js:~106] [pc=0x234c17c2a7dc] (this=0xfec271041b9 <undefined>,i=281,j=797)
build-16 (out):     6: dist(aka dist) [/home/f/node_modules/rfc6902/rfc6902.js:~106] [pc=0x234c17c2a88a] (this=0...
build-16 (out):

Even with the stringify/parse code to convert everything to plain objects.

chbrown commented 8 years ago

I'm not sure what I'm supposed to gather from the GC stacktrace.

Regarding the comment before that, it seems lean() or whatever is lying to you, as including an ObjectId instance isn't what I would consider a plain old javascript object (aka. 'POJO').

If you really want to see what's going wrong, stick a console.log(input, output, i, j); right after the function dist(i, j) { line.

I can't fix mongoose for you, but I'll look into adding a secondary check so that even if an object purports to be an Array, but has an invalid length, it won't overflow the stack.

brandonros commented 8 years ago

The error still happens even after this:

var patch = rfc6902.createPatch(JSON.parse(JSON.stringify(o)), JSON.parse(JSON.stringify(n))); 

I am working on getting you test data now.

chbrown commented 8 years ago

Excellent. If you can get createPatch to break on JSON.parse output, I'll be glad to fix it.

brandonros commented 8 years ago

Do you know if this behaves well with lots of objects? Like, 10k objects, each having arrays and objects within them? I think that might be the issue.

chbrown commented 8 years ago

diff() tries to be smart with arrays, so if you have really long arrays, that might be a concern, but that shouldn't cause the errors you're running into, which I'm inclined to blame entirely on mongoose.

If you're worried about load, try it out with 10k of your objects, and see if it runs fast enough for you. If not, you can keep looking or do some profiling. Up to you.

brandonros commented 8 years ago

I can't get any meaningful output out of the console.log line you told me due to the sheer number of objects involved here.

chbrown commented 8 years ago

Ah, okay, then cut it down to just console.log(i, j). If my original guess at the underlying problem is correct, you'll see some NaN or null or negative numbers right before it stack overflows.

brandonros commented 8 years ago

The old and new elements are equal. There are 1,006 elements.

A test element looks like this:

{
  "store_id": "57057eb7c9f0dc2a53cf8896",
  "order_number": "3730",
  "status": "unstarted",
  "stash": {
    "_id": "5722d707287c498021052d46",
    "store_id": "57057eb7c9f0dc2a53cf8896",
    "shipping_information": {
      "address_type": null,
      "method": "None",
      "address_2": "xxx",
      "address": "zzz"
    },
    "shipping": 0,
    "tax": 21.93,
    "discount": 0,
    "total": 460.5,
    "order_id": "56721",
    "id": "3730",
    "items": [
      {
        "dropshipped": false,
        "quantity": 25,
        "price": 417.75,
        "_id": "5722d707287c498021052d48"
      },
      {
        "dropshipped": false,
        "quantity": 1,
        "price": 20.82,
        "_id": "5722d707287c498021052d47"
      }
    ]
  },
  "batch_number": null,
  "shipping_memory_hash": null,
  "combined_order_numbers": [],
  "tags": [],
  "notes": [],
  "shipments": [],
  "log": []
}

i is creeping up to over 250

Is this normal?

brandonros commented 8 years ago

For what it is worth, the package jsondiffpatch currently handles this data fine (and in under 100ms)

I'm moving off of it because my users experience intermittent issues that I assume have to be related to the patch protocol.

brandonros commented 8 years ago
{"i":213,"j":884}
{"i":214,"j":883}
{"i":213,"j":883}
{"i":214,"j":882}
{"i":213,"j":882}
{"i":214,"j":881}
{"i":213,"j":881}

<--- Last few GCs --->

  176852 ms: Scavenge 1083.3 (1457.6) -> 1083.3 (1457.6) MB, 0.6 / 0 ms [allocation failure].
  176853 ms: Scavenge 1083.3 (1457.6) -> 1083.3 (1457.6) MB, 0.6 / 0 ms [allocation failure].
  176853 ms: Scavenge 1083.3 (1457.6) -> 1083.3 (1457.6) MB, 0.6 / 0 ms [allocation failure].
  181665 ms: Mark-sweep 1083.3 (1457.6) -> 1083.1 (1457.6) MB, 4811.7 / 0 ms [last resort gc].
  186550 ms: Mark-sweep 1083.1 (1457.6) -> 1083.3 (1457.6) MB, 4884.7 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x1ade35eb4629 <JS Object>
    1: ArrayConcatJS [native array.js:~398] [pc=0xfb736320849] (this=0x336f0becdd11 <JS Array[881]>,ah=0x36a4bacdb6b1 <an Object with map 0x18d7a4c50029>)
    5: dist(aka dist) [/home/skulabs/node_modules/rfc6902/rfc6902.js:~106] [pc=0xfb73632567d] (this=0x1ade35e041b9 <undefined>,i=214,j=881)
    6: dist(aka dist) [/home/skulabs/node_modules/rfc6902/rfc6902.js:~106] [pc=0xfb73632574f] (this=0x...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)
chbrown commented 8 years ago

Just published v1.2.0, which shouldn't stack overflow even when you give it weird mongoose objects, but since I never got a replicable test case from you, I haven't been able to test that.

I also generated an array of pretty complex 1K objects via http://www.json-generator.com/, and mucked around with making a few changes to a copy of the array, trying to get createPatch() to break, but couldn't.

If you can come up with a breaking test case that has the following form, I'll fix it:

var input = <some object literal>;
var output = <another object literal>;
createPatch(input, output);

Otherwise I'm going to consider this issue closed / fixed, because I can't infer what's going wrong from your stack traces.

brandonros commented 8 years ago

How long did it take to generate the patch for the large test, approximately?

chbrown commented 8 years ago

It didn't trigger mocha's "slow test" warning, so, < 75ms.

brandonros commented 8 years ago

Could it have been that my input and output were equal? Could it have anything to do with a common key like _id and their sorted order?

I will try to provide more information.

chbrown commented 8 years ago

I tried that (along with other mutations). Using 1k_array.json, the test below runs and passes in < 75ms.

describe('issues/10', () => {
  var input = JSON.parse(readFileSync(`${__dirname}/1k_array.json`));
  var output = JSON.parse(JSON.stringify(input));
  var expected_patch = [];
  var actual_patch = createPatch(input, output);
  it('should produce patch equal to expectation', () => {
    assert.deepEqual(actual_patch, expected_patch);
  });
});
brandonros commented 8 years ago

I'm able to get it to recreate but the stringified JSON is 24mb containing sensitive data.

 function dist(i, j) {
        console.log(i, j);
29698 29698
29697 29697
RangeError: Maximum call stack size exceeded
    at Object.stylizeNoColor [as stylize] (util.js:158:24)
    at formatPrimitive (util.js:439:16)
    at formatValue (util.js:227:19)
    at inspect (util.js:109:10)
    at exports.format (util.js:17:20)
    at Console.log (console.js:39:34)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:107:17)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:113:28)

Where would you like me to add debugging information?

brandonros commented 8 years ago

If I turn off that console.log, I get this

RangeError: Maximum call stack size exceeded
    at /Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:283:45
    at Array.every (native)
    at compareArrays (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:283:30)
    at compareObjects (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:293:10)
    at compare (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:309:16)
    at /Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:284:16
    at Array.every (native)
    at compareArrays (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:283:30)
    at compare (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:305:16)
    at /Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:296:16
    at Array.every (native)
    at compareObjects (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:295:23)
    at compare (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:309:16)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:110:17)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
    at dist (/Users/Marketing/Desktop/foo/node_modules/rfc6902/rfc6902.js:112:28)
brandonros commented 8 years ago

Any reason you roll if (Object(left) === left && Object(right) === right) { instead of if (typeof left === 'object' && typeof right === 'object') {?

brandonros commented 8 years ago

The test data is an array of objects with a length of 38,350. I wish I didn't have data this disgusting, trust me.

If I slice it to the first 8,500, it takes 288ms but does not overflow (which is faster than jsondiffpatch).

At around a length of around 9,000 for input and output, the overflows start (plain JSON.parsed objects).

The expected patch should be [].

brandonros commented 8 years ago

Alright, this really isn't an error with this library at all.

I ran node --stack-size=65500 scratchpads/diff_test.js and it blew every other diffing library out of the water.

Nasty as hell. Is there anything that can be done? Would any abuse of Promises combined with setTimeout(..., 0) or setImmediate or process.nextTick aid?

chbrown commented 8 years ago

I have an idea for a way to transform the recursive dist() algorithm into a plain Array based stack that queues up edit cost lookups, but it will take O(2n) time in the best case, and I may decide not to replace the current recursive dist() implementation if my loop idea 1) takes unreasonably longer for small arrays, or 2) turns out to be too messy / inelegant / unmaintainable.

So if you want a quick and sure solution, I'd recommend defining your own diffArrays function, then importing diffObjects and diffValues from require('rfc6902/diff') (now exported as of v1.2.1), and then, using require('rfc6902/diff').diffAny as a template, wrapping those into your own custom createPatch(...). I don't understand why you're diffing arrays that are apparently always equal anyway, but if you envision, someday, diffing inequal arrays, finding the sequence of operations corresponding to a/the minimum edit distance may not be what you want out of it. I.e., a different algorithm might suit your needs better. Writing your own diffArrays would allow you to customize that.

brandonros commented 8 years ago

Thanks for the response.

The arrays will not be equal. They can have elements removed, added, or any field (which can also be an array of objects, yada yada) changed. It's the client side database for a SPA that I patch on any changes.

I've gotten this to pass my local tests, but as soon as I put it under any real load, I have to roll back to jsondiffpatch.

Please follow up with any progress you make. I am happy to help in any way possible.