JuliaIO / JSON.jl

JSON parsing and printing
Other
311 stars 100 forks source link

Parse bytes directly #356

Open robsmith11 opened 1 year ago

robsmith11 commented 1 year ago

It would be nice if JSON.Parser.parse could be passed a vector of bytes and parse it assuming UTF-8 encoding without having to manually allocate a new String. My most common use case (probably for many other people too?) is downloading a JSON file with HTTP.get("...").body, which returns bytes.

KristofferC commented 1 year ago

You could maybe use https://github.com/JuliaStrings/StringViews.jl.

robsmith11 commented 1 year ago

StringViews.jl does look good for use in projects, but would it make sense for more casual interactive use to have JSON.jl do something automatically when passed bytes?

KristofferC commented 1 year ago

One issue with that is that that means that arguably anything that accepts a string should also accept a byte buffer. And the best way to do that would probably be to use StringViews as a dependency and wrap the bytes in that. So it would kind of be equivalent except that all functions would have to define this instead of just the caller doing it.

kpa28-git commented 1 year ago

I've noticed that using StringViews instead of String does not improve performance for me (actually slightly worse performance and higher alloc). These are in the docs for String (julia 1.8.5). If I'm understanding right, strings produced from UTF-8 bytes already act like views.

String(v::AbstractVector{UInt8}) Create a new String object from a byte vector v containing UTF-8 encoded characters. ... When possible, the memory of v will be used without copying when the String object is created. This is guaranteed to be the case for byte vectors returned by take! on a writable IOBuffer and by calls to read(io, nb). This allows zero-copy conversion of I/O data to strings. In other cases, Vector{UInt8} data may be copied, but v is truncated anyway to guarantee consistent behavior.

KristofferC commented 1 year ago

"When possible"

This is not that often the case, the array need to have been allocated in a special way for this.

And copying a chunk of memory like a string tends to be quite fast so it isn't unfeasible that you don't notice it. And maybe StringViews has some issue which make it slower than it should be.