Open richardscarrott opened 4 years ago
Okay, I've found the problematic character by logging the buffer before the error is thrown; it's a right double quote and it only happens when at position 0 of a given chunk:
Any idea why this character would be misinterpreted, is it an issue on our side or a bug here?
It looks like this library (and JSONStream) is no longer actively maintained, so for anybody else who is unfortunate enough to run into this issue, I ended up using stream-json
which hasn't presented the same problem e.g.
Before:
import _ from 'highland';
import JSONStream from 'JSONStream';
// { data: [{}, {}, {}] }
_(readableStream)
.through(JSONStream.parse('data.*'))
.toArray((result) => console.log('DONE', result))
After:
import _ from 'highland';
import { parser } from 'stream-json';
import { pick } from 'stream-json/filters/Pick';
import { streamArray } from 'stream-json/streamers/StreamArray';
// { data: [{}, {}, {}] }
_(readableStream)
.through(parser())
.through(pick({ filter: 'data' }))
.through(streamArray())
.map(({ value }) => value)
.toArray((result) => console.log('DONE', result))
For others who stumble across this but still want to use this library, I think changing https://github.com/creationix/jsonparse/blob/b2d8bc6db4f6be3f276752b3b9f882b1945afede/jsonparse.js#L166-L171 can fix this.
Only emit the new character if the buffer contains at least as many bytes as are remaining in the sequence:
var toConsume = Math.min(this.bytes_remaining, buffer.length);
for (var j = 0; j < toConsume; j++) {
this.temp_buffs[this.bytes_in_sequence][this.bytes_in_sequence - this.bytes_remaining + j] = buffer[j];
}
this.bytes_remaining -= toConsume;
if (this.bytes_remaining === 0) {
this.appendStringBuf(this.temp_buffs[this.bytes_in_sequence]);
this.bytes_in_sequence = 0;
}
My fork is pretty far removed from this one, otherwise I'd publish this in a more useful format. Still, hope it helps someone!
We're indirectly using
jsonparse
viaJSONStream
to stream in JSON data stored in Google Cloud Storage and we're intermittently seeing the following error:99% of the time the data is parsed successfully so I'm guessing it's related to where the chunks of data are split over http -- I believe it could be related to emoji characters or Japanese chars as both exist in our json but I'm struggling to pin point exactly where it's failing.
Is there perhaps a way to log more information re: the string value it failed on?