Error in parsing JSON document containing Chinese characters.

GoogleCodeExporter commented 9 years ago

JSON document:

{
                 "username":"検索jan5検索.8@test.relay.symantec.com"
                 ,"password":"12341234"
                 ,"display_name":"1231212"
                 ,"country":"US"
}

parses username field as: [“jan5”.8@test.relay.symantec.com]

Original issue reported on code.google.com by aant...@gmail.com on 27 Apr 2011 at 10:19

GoogleCodeExporter commented 9 years ago

What version did you use (or SVN revision)?

Original comment by philippe...@gmail.com on 2 May 2011 at 2:11

GoogleCodeExporter commented 9 years ago

The version used is 1.1.1

This is the email I got from the person using it:

------------------

Hi Alex,

I used your code to convert my inputStream to a String before calling 
JsonFormat.merge, but I still have the same problem with the object after 
returning.  I verified that the string argument has the Chinese characters that 
were in the input stream.

I stepped through some of the JsonFormat code and processes the token in 
consumeByteString:

        /**
         * If the next token is a string, consume it and return its (unescaped) value. Otherwise,
         * throw a {@link ParseException}.
         */
        public String consumeString() throws ParseException {
            return consumeByteString().toStringUtf8();
        }

        /**
         * If the next token is a string, consume it, unescape it as a
         * {@link com.google.protobuf.ByteString}, and return it. Otherwise, throw a
         * {@link ParseException}.
         */
        public ByteString consumeByteString() throws ParseException {
            char quote = currentToken.length() > 0 ? currentToken.charAt(0) : '\0';
            if ((quote != '\"') && (quote != '\'')) {
                throw parseException("Expected string.");
            }

            if ((currentToken.length() < 2)
                || (currentToken.charAt(currentToken.length() - 1) != quote)) {
                throw parseException("String missing ending quote.");
            }

            try {
                String escaped = currentToken.substring(1, currentToken.length() - 1);
                ByteString result = unescapeBytes(escaped);

The function unescapeBytes treats it as a byte string, so the characters get 
lost because they aren’t contained in single bytes.  Do you know why it 
should be treating the token as a byte-string?  I think this is the essence of 
the problem.

Original comment by aant...@gmail.com on 3 May 2011 at 4:37

GoogleCodeExporter commented 9 years ago

This was fixed by patch for issue 11. The method "unescapeBytes" is no longer 
used for parsing strings.

Either use trunk or wait for the next release to get the fix.

Original comment by philippe...@gmail.com on 3 May 2011 at 12:39

Changed state: Duplicate

GoogleCodeExporter commented 9 years ago

Philippe / Alex, please add unit test to verify this issue on trunk.

I'm re-opening this issue till verification.

Original comment by eliran.bivas on 3 May 2011 at 12:49

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Alex can confirm, but he added a unit test in r61. I'll let Alex close the 
issue when he confirms this.

Original comment by philippe...@gmail.com on 3 May 2011 at 12:54

GoogleCodeExporter commented 9 years ago

Code reviewed, Alex - Close if you think this issue is fixed.

Original comment by eliran.bivas on 3 May 2011 at 1:23

GoogleCodeExporter commented 9 years ago

Looks like the latest trunk has fixed the issue.  Unit test is in place to 
verify that future changes won't break it.

Original comment by aant...@gmail.com on 4 May 2011 at 4:18

Changed state: Fixed

carlomedas / protobuf-java-format

Error in parsing JSON document containing Chinese characters. #32