dscape / clarinet

SAX based evented streaming JSON parser in JavaScript
Other
439 stars 77 forks source link

Very large stringified JSON #61

Open hrdwdmrbl opened 5 years ago

hrdwdmrbl commented 5 years ago

I'm interesting in parsing very large stringified JSON strings (up to 2GB). I have the string in memory, but the problem I have is that JSON.parse takes too long to do the parsing. So I'm looking at solutions like clarinet to progressively parse. What I'm wondering is whether clarinet blocks while parsing? Does it ever use setImmediate or yield or something like that so that the node event-loop can run something else and then come back to clarinet?

dscape commented 5 years ago

Any reason why you wouldn’t just try it? Also why do you have the string in memory why are you not reading the file as in (e.g. streamed)


Nuno Job CEO


nuno@yld.io www.yld.io Twitter / LinkedIn / Github From: Marc Beaupré notifications@github.com Sent: Wednesday, April 24, 2019 10:35 am To: dscape/clarinet Cc: Subscribed Subject: [dscape/clarinet] Very large stringified JSON (#61)

I'm interesting in parsing very large stringified JSON strings (up to 2GB). I have the string in memory, but the problem I have is that JSON.parse takes too long to do the parsing. So I'm looking at solutions like clarinet to progressively parse. What I'm wondering is whether clarinet blocks while parsing? Does it ever use setImmediate or yield or something like that so that the node event-loop can run something else and then come back to clarinet?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/dscape/clarinet/issues/61, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAACDAIM6MYDXZJ7UK6MPU3PSASVPANCNFSM4HICDBVQ.

hrdwdmrbl commented 5 years ago

@dscape Because it was the end of the day and I was hoping for an answer before returning to work the next day.

From my investigations though, the answer is that it does block. The likely reason is because the entire stream has been written and so there is no need to wait for more of the stream. So the parser can keep going until the end.

The reason for having the entire string is that the string is coming from socket.io library which AFAIK does not have a streaming-mode. So the entire string is ready for parsing immediately.

So seems like I can either switch to a streaming a streaming websocket library or I can feed clarinet the string in chunks

dscape commented 5 years ago

there's a sample folder, try using that code.

Nuno Job CEO


nuno@yld.io www.yld.io Twitter / LinkedIn / Github On Thu, 25 Apr 2019 at 07:01, Marc Beaupré notifications@github.com wrote:

@dscape https://github.com/dscape Because it was the end of the day and I was hoping for an answer before returning to work the next day.

From my investigations though, the answer is that it does block. The likely reason is because the entire stream has been written and so there is no need to wait for more of the stream. So the parser can keep going until the end.

The reason for having the entire string is that the string is coming from socket.io library which AFAIK does not have a streaming-mode. So the entire string is ready for parsing immediately.

So seems like I can either switch to a streaming a streaming websocket library or I can feed clarinet the string in chunks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dscape/clarinet/issues/61#issuecomment-486530846, or mute the thread https://github.com/notifications/unsubscribe-auth/AAACDANT566HDKTGTMQNGPLPSFCKZANCNFSM4HICDBVQ .