Open ScottPJones opened 6 years ago
We should put back any characters we are eating. Is there a reason peek
doesn't work on TCPStream
?
For one thing, peek
isn't even exported, and it is not implemented for TCPStream
in v0.6.2.
I haven't tried yet on master.
(The peek implementation looked like it might hurt performance a lot, also, at least one of the methods, that does a try / finally and saves / restores the position in the stream, after reading a single byte)
A better approach in that case would be to only restore the position in the case of reading a JSON number when finding something that isn't a digit (after a -
, .
, e
, E
or +
(in an exponent), one or more digits are always required).
What I mean is, is there a reason why peek
is not implemented (in Base), as it seems this would be a really useful feature overall?
I'm not sure why it was left unexported in master, but at least there it is better supported (instead of just 2 methods,
it has 7, including peek(s::Base.LibuvStream) in Base at stream.jl:1182
, which should handle TCPSocket
.
(I misremembered the name before, it's TCPSocket
instead of TCPStream
)
doesn't work for the type TCPStream.
You'll likely also run into trouble with wrappers like SSLContext
(almost every HTTP connection is a HTTPS connection nowdays).
To me having a peek
that only returns one character is a bit arbitrary. I think it would be more useful to have a peek
that returns a view of however many bytes are available in whatever buffer the IO
implementation has. e.g. it would be handy to peek
in the buffer for presence of e.g. \r\n\r\n
before reading.
One character (really, 1 byte) look-ahead is all that the JSON format needs, and that is only for numbers (everything else is terminated by a byte that is known in advance, i.e. '}'
, ']'
, ':'
or '"'
).
Having a function that could read the entire buffer of ready to be read bytes would also be quite useful, for other things.
I hit this bug today while accidentally reading some SOAP responses. Having the response wiped while doing a JSON read feels like a bug. In this example the original string is preserved, but in my debugging session, I was dealing with the array output initially not a string.
julia> body = "<XML>This is not JSON</XML>\n"
"<XML>This is not JSON</XML>\n"
julia> array = Vector{UInt8}(body)
28-element Array{UInt8,1}:
0x3c
0x58
0x4d
0x4c
0x3e
0x54
0x68
0x69
0x73
0x20
0x69
0x73
0x20
0x6e
0x6f
0x74
0x20
0x4a
0x53
0x4f
0x4e
0x3c
0x2f
0x58
0x4d
0x4c
0x3e
0x0a
julia> JSON.parse(String(array))
ERROR: Unexpected character
Line: 0
Around: ...<XML>This is not JSON<...
^
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] _error(::String, ::JSON.Parser.MemoryParserState) at /Users/user/.julia/packages/JSON/3rsiS/src/Parser.jl:140
[3] parse_jsconstant(::JSON.Parser.ParserContext{Dict{String,Any},Int64,true,nothing}, ::JSON.Parser.MemoryParserState) at /Users/user/.julia/packages/JSON/3rsiS/src/Parser.jl:193
[4] parse_value(::JSON.Parser.ParserContext{Dict{String,Any},Int64,true,nothing}, ::JSON.Parser.MemoryParserState) at /Users/user/.julia/packages/JSON/3rsiS/src/Parser.jl:170
[5] #parse#1(::Type, ::Type{Int64}, ::Bool, ::Nothing, ::typeof(JSON.Parser.parse), ::String) at /Users/user/.julia/packages/JSON/3rsiS/src/Parser.jl:460
[6] parse(::String) at /Users/user/.julia/packages/JSON/3rsiS/src/Parser.jl:458
[7] top-level scope at REPL[19]:1
julia> array
0-element Array{UInt8,1}
julia> body
"<XML>This is not JSON</XML>\n"
Basically, reading off the end of the array causes the array to get dropped. I would rather it not drop the array.
It looks like this thread here is related.
How is this related to JSON.jl
?
julia> body = "<XML>This is not JSON</XML>\n"
"<XML>This is not JSON</XML>\n"
julia> array = Vector{UInt8}(body);
julia> String(array);
julia> array
UInt8[]
While investigating an issue with my changes for better performance and to fix numeric parsing issues, I realized that parsing a number can "eat" the next character in a stream. The way that the
async.jl
unit test works, by writing out a stream of JSON values, and then reading them on another process, breaks if there is not another character to separate the values, and just a number is being read. I attempted to fix this usingBase.peek
, however on v0.6.2, it doesn't work for the typeTCPStream
. I'm not sure what the behavior of a similar test in JavaScript or Python would be.