eulerto / wal2json

JSON output plugin for changeset extraction
BSD 3-Clause "New" or "Revised" License
1.33k stars 161 forks source link

Size unhandled in pg_decode_message #28

Closed rjuju closed 7 years ago

rjuju commented 7 years ago

Hello again,

While testing some code path, I found that the content_size isn't used in every code path of pg_decode_mesage(), which can lead to erreoneous out. For instance:

rjuju=# SELECT pg_logical_emit_message(false, 'wal2json', 'a very long message');
 pg_logical_emit_message 
-------------------------
 0/629A5358
(1 row)

rjuju=# SELECT pg_logical_emit_message(false, 'wal2json', 'meh');
 pg_logical_emit_message 
-------------------------
 0/629A5398
(1 row)

rjuju=# select * from pg_logical_slot_peek_changes('wal2json', null, null, 'pretty-print', '1');
    lsn     | xid |                           data                           
------------+-----+----------------------------------------------------------
 0/629A5358 |   0 | {                                                       +
            |     |         "change": [                                     +
            |     |                 {                                       +
            |     |                         "kind": "message",              +
            |     |                         "transactional": false,         +
            |     |                         "prefix": "wal2json",           +
            |     |                         "content": "a very long message"+
            |     |                 }                                       +
            |     |         ]                                               +
            |     | }
 0/629A5398 |   0 | {                                                       +
            |     |         "change": [                                     +
            |     |                 {                                       +
            |     |                         "kind": "message",              +
            |     |                         "transactional": false,         +
            |     |                         "prefix": "wal2json",           +
            |     |                         "content": "meh"                +
            |     |                 }                                       +
            |     |         ]                                               +
            |     | }
(2 rows)

rjuju=# select * from pg_logical_slot_peek_changes('wal2json', null, null, 'pretty-print', '0');
    lsn     | xid |                                                   data                                                    
------------+-----+-----------------------------------------------------------------------------------------------------------
 0/629A5358 |   0 | {"change":[{"kind":"message","transactional":false,"prefix":"wal2json","content":"a very long message"}]}
 0/629A5398 |   0 | {"change":[{"kind":"message","transactional":false,"prefix":"wal2json","content":"mehery long message"}]}
(2 rows)

(see last two messages content).

I didn't read upstream infrastructure, but I assume that for performance issue content isn't zeroed on every call, so you can't rely on it being NULL-terminated.

I wanted to provide a fix for this, but I have some questions.

I'm not sure why when pretty-print is asked you do a appendBinaryStringInfo() and otherwise call quote_escape_json(). I assume that quote_escape_json() should be called in both cases, or is there something I missed?

If quote_escape_json() has to be called() for both, I think we can ad a Size parameter, and iterate as an array instead of currrent pointer loop. We could also t manually call enlargeStringInfo() with the given size, instead of relying on it doubling it's size autmatically. That probably won't be a huge win, but it might still win some cycles.

What do you thing?

eulerto commented 7 years ago

Oh, crap. I committed a experimental code in the original commit. I ripped it out now. A future commit will add code for escaping prefix and content.

rjuju commented 7 years ago

Ah, yes that makes sense. Thanks for fixing it!