chezinut / as3corelib

Automatically exported from code.google.com/p/as3corelib
0 stars 0 forks source link

JSON.decode allows unescaped carriage-returns #104

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Add this test case to com.adobe.serialization.json.JSONTest:

    public function testLF():void {
        var str:String= '{"test":"line1\nline2\\nline3"}';   // contains escaped and unescaped LF           
        var jsonObj:Object= JSON.decode(str);

        trace("raw json:\n'" + str + "'\n");            
        trace("test field:\n'" + jsonObj.test + "'\n");
    }

2. Note the trace output:
    raw json:
    '{"test":"line1
    line2\nline3"}'

    test field:
    'line1
    line2
    line3'

What is the expected output? What do you see instead?

This test should FAIL, but it succeeds.  According to the json docs 
(http://www.json.org/), 
unescaped carriage returns are not allowed inside strings.

What version of the product are you using? On what operating system?
   as3corelib 0.92.1 on Mac OS X 10.5 with Flash Player 10.

Please provide any additional information below.

Original issue reported on code.google.com by paleozogt on 30 Apr 2009 at 5:39

GoogleCodeExporter commented 9 years ago
Marking as not a bug because I believe this is working as expected.  Consider 
the following test:

    public function testDecodeStringWithEscapedBackslashAndNewLines():void
    {
        // Contains escaped and unescaped LF
        var innerString:String= "line1\nline2\\nmore line2";
        // Decode an object that has the inner string inside of it
        var o:* = JSON.decode( JSON.encode( innerString ) );

        assertEquals( innerString, o );
    }

This test case passes.  The innerString value is exactly the same after going 
through an encode/decode 
process.

In your example, the reason you're seeing "strange" behavior is because the 
string is escaped before it has a 
chance to be decoded by the JSON library.  The string you're actually passing 
into JSON.decode looks like this:

    "line1{newline character}line2{backslash character}nline3"

When the JSON decoder runs, it finds the backslash character immediately 
followed by an "n" and it converts 
that to a {newline character}.  So, after the decode, the result ends up being: 
"line1{newline}line2{newline}line3".  That explains why the result differs from 
the original string you specified.

Remember that a string must be properly encoded in order for it to be decoded 
correctly.

Original comment by darron.schall on 8 Jul 2009 at 7:02

GoogleCodeExporter commented 9 years ago
So what you're saying is that JSON.decode will accept malformed JSON?  Is that 
how it should work?  I 
think its important that the decoder reject bad JSON.

We had a process that was emitting bad JSON (just like described in the bug).  
JSON.decode would happily 
accept it.  It wasn't until we hooked up a java json decoder (which rejected 
the bad json) that we even 
realized that there was a problem.

Original comment by paleozogt on 8 Jul 2009 at 7:11

GoogleCodeExporter commented 9 years ago
I understand what you're saying now.  The JSON string "line1{newline 
character}line2" is technically malformed 
because the {newline character} is not escaped to "\n".  The JSON decoder is 
not throwing an error when 
processing this malformed string.

I'll update the tokenizer to throw a parse error in strict mode when unescaped 
control characters (\u0000 
through \u001F) are found in strings.  

In non-strict mode, I won't change the behavior because the idea of the 
non-strict mode is to parse the JSON 
the best that we can even if the string is malformed.

Original comment by darron.schall on 8 Jul 2009 at 7:22

GoogleCodeExporter commented 9 years ago
Fixed in r91.  Modified readString in the tokenizer to look for control 
characters.  

Commit message: In JSON strict mode, when a string contains an unescaped control
character (0x00-0x1F) a parse error is now thrown because the spec indicates
that strings cannot contain unescaped control characters.

In non-strict mode, the error is ignored and the control character is "passed
through" to the decoded string value.

Original comment by darron.schall on 8 Jul 2009 at 7:46