dominictarr / JSONStream

rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)
Other
1.91k stars 165 forks source link

How to handle large JSON objects #152

Closed WilfredTA closed 6 years ago

WilfredTA commented 6 years ago

I am trying to stream large JSON objects through TCP streams.

Server Side Code

const stream = JSONStream.parse();
    serverConnection.pipe(stream);

    stream.on('data', (receivedData, error) => { store(receivedData.fileData) }

Client side code

fs.readFile(`${filePath}`, (err, fileData) => {
      let { port, host } = nodeInfo;
      let client = this.connect(port, host);

      let message = {
        messageType: "STORE_FILE",
        fileName: shard,
        fileContent: fileData
      };

    client.write(JSON.stringify(message))

Error

<--- Last few GCs --->

[37937:0x103003600]   178603 ms: Mark-sweep 770.2 (776.2) -> 404.2 (410.2) MB, 170.2 / 0.0 ms  allocation failure GC in old space requested

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2eee5d225ee1 <JSObject>
    1: toJSON [buffer.js:~938] [pc=0x2039a68c17dd](this=0x2eeefd54ad21 <Uint8Array map = 0x2eee168c1d91>)
    2: arguments adaptor frame: 1->0
    5: builtin exit frame: stringify(this=0x2eee5d209041 <Object map = 0x2eee16882ba1>,0x2eee9ef02311 <undefined>,0x2eee9ef02311 <undefined>,0x2eeefd54ac39 <Object map = 0x2eeeee94b529>)

    6: arguments adaptor frame: 1->3
    8: sendShardToNode [/Users...

FATAL ERROR: invalid array length Allocation failed - JavaScript heap out of memory

It appears that this is because I cannot stringify an object that is so large (because the fileData property of message object is too large).

How can I change this to take advantage of JSON stream so that these large JSON objects can be streamed through?

Further, if I call store on receivedData.fileData with the server code (which writes the data to a file), will all the data be written to a file?

doowb commented 6 years ago

This seems like a better question for StackOverflow since it's more about how to write your client and server code and not a bug with JSONStream. However, I'll give some suggestions, but I'm not sure the exact way to implement them.

One suggestion is to implement the client and server code in a way that you can open 2 streams, one for the metadata {messageType: "STORE_FILE", fileName: shard} and another that streams the file to the server instead of reading the entire buffer:

fs.createReadStream(`${filePath}`).pipe(client);

Then on the server side, you'd be able to stream the file contents directly to store (I'll use fs in this case:

serverConnection.pipe(fs.createWriteStream(`${filePath}`));

Since JSON.stringify will turn a Buffer into an Array, I'm not sure exactly if the following will work, but you might be able to try it. Instead of stringifying the entire message, you can try to send it through the client in chunks and on the server side read the chunks into a buffer:

client.js
fs.readFile(`${filePath}`, (err, fileData) => {
  let { port, host } = nodeInfo;
  let client = this.connect(port, host);

  let message = `{
  "messageType": "STORE_FILE",
  "fileName": "${shard}",
  "fileContent": [
    `;
  client.write(message);

  let len = fileData.length;
  let size = 128; // pick something here for the size of the chunks
  for (let i = 0; i < len; i += size) {
    client.write(JSON.stringify({chunk: fileData.slice(i, i + size)}));
    if (i + size < len) {
      client.write(',\n');
    }
  }
  client.write(']}');
  client.write(null);
});
// server.js
const stream = JSONStream.parse('fileContent.chunk');
serverConnection.pipe(stream);

let buffer = Buffer('');
stream.on('data', (chunk) => { buffer = buffer.concat(chunk) });
stream.on('end', () => store(buffer));

I didn't test this code and I just thought of it so I'm not sure how well it will work.

WilfredTA commented 6 years ago

Okay, that's interesting, thanks for the reply.

Does this library handle concatenated JSON?

WilfredTA commented 6 years ago

I came up with this solution for the client side:


class Jsonify extends Transform {
  constructor(options){
    super(options)
    this._fileName = options.fileName;
    this._messageType = options._messageType;
  }

  _transform(chunk, encoding, callback){
    this.push(JSON.stringify({
      fileName: this._fileName,
      _messageType: this._messageType,
      fileData: chunk
    }))

    callback()
  }

}

let readStream = fs.createReadStream(`./shards/${storedShardName}`);
    let jsonify = new Jsonify({fileName: storedShardName, messageType: "STORE_FILE"});
    let client = this.connect(port, host)
    readStream.pipe(jsonify).pipe(client)

Making use of pipes makes sure that the memory usage at any point in time does not approach unallowable levels. fs.readFile tries to load all the data into memory before executing the callback - not really a good method.

If the JSONStream is passed multiple JSON objects, can it delimit them? If the data event fires and only, say, 3/4 of the JSON object is there, does it wait for the rest to make it over before parsing it? I can't see how else it could handle such large JSON objects. However, it seems quite problematic for large files because that means it continuously keeps larger and larger files in memory.

doowb commented 6 years ago

You can try it out, but I think the entire file needs to be a valid JSON object but it's parsed while it streams through and if the file is an array of objects, you can get each object by doing JSONStream.parse('*'). From your client example above, this would give each object as {fileName: '', _messageType: '', fileData: ''}.

You may look into JSONStream.stringify to see if that might be used like this:

let readStream = fs.createReadStream(`./shards/${storedShardName}`);
    let jsonify = new Jsonify({fileName: storedShardName, messageType: "STORE_FILE"});
    let client = this.connect(port, host)
    readStream
      .pipe(jsonify)
      .pipe(JSON.stringify())
      .pipe(client)

Then instead of stringifying the object in the _transform method, you could just pass in the object. I'm not sure all the options necessary to make sure the Transform stream is in object mode and if that will work with the fs.createReadStream...

You could change it so that Jsonify works similar to what JSONStream.stringify does and send an opening array bracket: [ before writing any other content, but ensure to write a , if it's not the last object coming through, then a closing ] at the end. This will be the format that JSONStream.parse would expect.

Then on the server, when you get the object with the fileData chunk, you can write that to an fs.createWriteStream.

doowb commented 6 years ago

I'm closing this since it's not for a bug in JSONStream.