go-pg / pg

Golang ORM with focus on PostgreSQL features and performance
https://pg.uptrace.dev/
BSD 2-Clause "Simplified" License
5.65k stars 400 forks source link

[v9] json array columns have serialization edge cases with special characters #1776

Open DeviateFish opened 3 years ago

DeviateFish commented 3 years ago

There are a couple instances where inserted values in JSONB[] columns are not properly escaped. Specifically, this applies to double-quotes (") and unicode escape sequences (i.e. \u0026) when they appear inside the json payload itself.

In the case of the double-quotes, insufficient escaping leaves the double-quote bare, closing the double-quote for the entry in the JSONB array. In the case of the unicode escape sequences, the backslash is not escaped, resulting in the unicode escape sequence being stored as a literal (i.e. u0026).

The latter manifests if the payload being json encoded contains characters that json.Marshal explicitly converts to unicode escape sequences.

Expected Behavior

Both of the above should work: json payloads including double-quotes in keys or values should not break psql statements, and unicode escape sequences should retain their escape characters, allowing them to be properly decoded when read back.

Current Behavior

In the case of the double-quote, any statements including their literal values (i.e. INSERT or UPDATE) will fail due to early termination of a quoted string. In the case of unicode escape sequences, the values when read back out will be lacking the backslash escape characters (i.e. & becomes \u0026 during json encoding, but is persisted to the db as u0026).

Possible Solution

I believe this line should be b = append(b, c) instead, but this might break other edge cases during encoding. This might need to only happen when the value being encoded is part of an array. Making this change resolves the issues I ran into, but I didn't exercise any other test cases to see if anything new broke.

Steps to Reproduce

See this gist for a test case and a workaround. Be sure to swap out relevant db information where applicable.

The TL;DR here is:

// Some fake thing to exercise json stuff
// Table is just `CREATE TABLE json_test (foo JSONB[] not null);`
type JSONBArrayModel struct {
    tableName struct{}                 `pg:"json_test"` //nolint
    Foo       []map[string]interface{} `pg:"foo,array,use_zero"`
}

...

withEscapedQuote := JSONBArrayModel{
    Foo: []map[string]interface{}{
        {"foo": "\"bar"},
    },
}

if _, err := conn.Model(&withEscapedQuote).Insert(); err != nil {
    // this will panic with "ERROR #22P02 invalid input syntax for type json":
    panic(err)
}

Context (Environment)

Not much to add here except I know how much of a pain it is to serialize/deserialize JSONB arrays in particular 😞 In general I would avoid them, but can't in this case. I'm also pretty new to working with Go, so it's also possible I'm just missing something very obvious.

This is running against PG11 (11.1 specifically), using go 1.15 and pg v9.2.0. I believe this also affects v10, given that the encoding methods seem to be unchanged across the versions.

Note that I was able to work around this (see example) by hinting that the column was actually an array of strings, and then handling the json encoding myself. This results in properly escaped json strings.

Detailed Description

See the example, but this is easy to reproduce in a table with a JSONB[] column and attempting to insert either of the two following json payloads:

[
  {"\"key": "value"}
]
[
  {"key": "&value"}
]

The improperly escaped double-quote on the former causes the INSERT to fail, while the improperly escaped unicode generated by json.Marshal on the latter successfully inserts, but is read back as [{"key": "u0026value"}]

nirsht commented 1 year ago

I encountered the same issue

nirsht commented 1 year ago

I think this issue that I opened is the same one https://github.com/go-pg/pg/issues/1979