buger / jsonparser

One of the fastest alternative JSON parser for Go that does not require schema
MIT License
5.43k stars 435 forks source link

Streaming Parser? #132

Open ex3ndr opened 6 years ago

ex3ndr commented 6 years ago

This is an awesome library, but is it possible to stream byte arrays somehow for being able to parse super large JSON files?

buger commented 6 years ago

Well, in theory, this is possible, but I'm not sure what can be the best way to implement the right interface. Will be really cool to see how you see this implemented in context of jsonparser.

buger commented 6 years ago

Also note, that even now it should be possible to implement smth like this, due to the fact that jsonparser operate with []byte structure, and basic functions like Get and ObjectEach return offset field, which is basically just array index of the original array. So you read until you get error, remember last index, and continue when you get more data, by calling it again, and providing []byte slice with new data.

rocaltair commented 6 years ago

I got the same trouble.

There's one string (base64 encoded) as value which's very large in a dict , and I need to write this streaming string into socket peer as parsing.

Generally speaking, there must be a limitation for the length of a single key or the depth, and they are short, we gonna make value(string especial) streaming, that would be enough.

Do you have any ideas for such scene?

buger commented 6 years ago

@rocaltair there is an internal function called searchKey, and its main idea is to return offset pointing to the value of the key. So if we made this function public, you can use it to find key value offset, and based on that use standard Go bytes tools, to iterate through the data, until you met " symbol (end of value). If you think it makes sense, create a simple PR which converts this function to upper case.

Cheers!

rocaltair commented 6 years ago

That's not my point.

How about making the first parameter data []byte for each function as an io.Reader, and the value we get as an io.Writer if it is a string?

Less memory would we take, if we could, in my opinion.

igor0 commented 3 years ago

Thanks for developing and maintaining this library! One thought I'd add to this.

[...] is it possible to stream byte arrays somehow for being able to parse super large JSON files Well, in theory, this is possible, but I'm not sure what can be the best way to implement the right interface.

I'd just want a simple JSON lexer/tokenizer that pulls bytes from an io.Reader. I can easily write a simple parser on top of that. There would be two benefits:

  1. Always single-pass, even for large, complex nested structures
  2. Doesn't require loading all JSON into memory at once

I can't find any golang library that can do this today. Admittedly, this level of efficiency doesn't matter for most use cases.

G2G2G2G commented 2 years ago

@igor0 like this https://pkg.go.dev/github.com/valyala/fastjson?utm_source=godoc#Scanner ?

igor0 commented 2 years ago

@G2G2G2G That's still not quite like what I'm talking about. Let's say that I have this input:

{"article": "Article1", "sections": [{"id": "1"}, {"id": 2}]}
{"article": "Article2"}

As I understand it, fastjson Scanner will give me two strings, one for each article record. Then, I need subsequent scans to parse each of those strings. So, I end up with multiple scans in order to parse the JSON. In the worst case, this is quadratic complexity (although admittedly that would have to be pretty horrible JSON).

I'd just want to get a stream of tokens so that I can parse in a single pass. At least at the time I looked, there wasn't such Go library available.

G2G2G2G commented 2 years ago

oh I see, yea I understand why nothing is around, I guess it'd have to piece it together like scanner does otherwise malformed or incorrect json could be in the middle of it or whatever