Open dsnet opened 3 years ago
I agree that the current definition is a bit ambiguous, happy to review a CL. I assume we should just document the current behavior.
I love the third idea, how about:
"The offset points to the first character of the next token that may be returned. (except colons or commas)"
The current comments and implement at src/encoding/json/stream.go didn't treat colons or commas as Token:
// A Token holds a value of one of these types: // // Delim, for the four JSON delimiters [ ] { } // bool, for JSON booleans // float64, for JSON numbers // Number, for JSON numbers // string, for JSON string literals // nil, for JSON null
It looks like we have to change a lot of comments if we want to fix it.
"The offset points to the first character of the next token that may be returned. (except colons or commas)"
This is the most user friendly semantic, but breaks in other ways since it requires reading from the input stream until it locates the next token. This would cause the Decoder
to block on a read (e.g., from a network socket) in cases when it otherwise would not have.
It looks like we have to change a lot of comments if we want to fix it.
A JSON token is a well defined in RFC section 2. I don't think we need to enumerate every possible case.
The documentation on
Decoder.InputOffset
currently says:This explanation is somewhat incorrect since the offset cannot be a single value if there are intervening whitespace in-between two JSON tokens. Furthermore, the
json
package implicitly handles colons and commas, which are technically JSON tokens according to RFC 7159, section 2. May the offset ever be past these implicit colons and commas, or will it always be before them? The "token" in this context is a somewhat ambiguous.The documentation should be clarified to say one of the following:
I chose the phrase "may be returned" to indicate that colons and commas aren't included since they aren't ever returned by the
json
API. I haven't checked the implementation yet to see what it actually returns.The first definition gives more flexibility to how the
Decoder
may actually be implemented. The latter two definitions gives the user more assurance about whatInputOffset
actually means.Updates #29688