amazon-ion / ion-go

A Go implementation of Amazon Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
174 stars 31 forks source link

String/Symbol/CLOB Literal Escape Processing #17

Open almann opened 4 years ago

almann commented 4 years ago

While investigating #3. It seems that escapes pass through verbatim in text parsing; they are recognized by the lexer, but not replaced with the runes (code points) they represent.

E.g. adding the following to lex_test.go:

        {
            name:     "quoted string with escapes",
            input:    []byte(`"\x41"`),
            expected: []Item{doubleQuote("A"), tEOF},
        },

Fails as follows:

    --- FAIL: TestLex/quoted_string_with_escapes (0.00s)
        lex_test.go:738: Expected: [<A> EOF]
        lex_test.go:739: Found:    [<\x41> EOF]
        lex_test.go:740: (-expected, +found)   []lex.Item{
            -   s"<A>",
            +   s`<\x41>`,
                {Type: s"EOF"},
              }

Either I misunderstand the responsibility of the lexer, or this is a bug and we should make sure escapes are being processed correctly.

almann commented 4 years ago

parse_text_simple.go#L65-L170 shows where the processing of escapes are happening. It duplicates the lexing code a bit, since the lexing code has to effectively parse the escapes when doing the normalization. That said, I don't see this processing all the escapes (particularly \u, \U, and \x), so I think we need to take a closer look at this logic and its factoring.