eulerto / wal2json

JSON output plugin for changeset extraction
BSD 3-Clause "New" or "Revised" License
1.33k stars 161 forks source link

pg_recvlogical very unstable #100

Closed maipeng closed 5 years ago

maipeng commented 5 years ago

Hello, We use Wal2Json as a service. It begins to create a slot if not exists and then starts to listen to wal by using pg_recvlogical —start -o include-types=0 -f - When 2 or 3 long queries are running on the database, it seems the pg_recvlogical stuck in state = startup and can not create the slot. Did you ever heard of this issue ? Moreover, when we need to use the option --write-in-chunks, because pg_recvlogical can not catchup, the json is not well formated ( in fact, it is well formated few times, also no data is provided) so this option is unstable.

Invalid output format cases Case A Objective Read the data from the Wal2Json. Connection method pg_recvlogical -h host -d database --slot slot_name -U user --start -o add-tables=test.* -o include-types=0 -f - 
Query UPDATE test.act SET action = action WHERE id=1000000033 
Wal2Json output result We get the following output from the Wal2Json : a valid json string. """ {"change":[\n {"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gidentity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n ]}\n """

And we are able to parse it right. Case B Objective Read the data from the Wal2Json in “chunk” mode. Connection method We add -o write-in-chunks=true 2-pg_recvlogical -h host -d database --slot slot_name -U user --start -o add-tables=test.* -o include-types=0 -o write-in-chunks=true -f - Query Same query as in Case A UPDATE test.act SET action = action WHERE id=1000000033 Wal2Json output result We get 3 differents behaviours.

  1. Resulting behaviour 1 : Nothing No data is provided.

  2. Resulting behaviour 2 : Non-parsable / incomplete data

We get partial data, first : "{"change":[\n" 
Then that : {"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gidentity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n ]}\n """

  1. Resulting behaviour 3 : Parsable data Raw output

 """ {"change":[\n {"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gidentity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n ]}\n """ Resulting object representation : that is fine.

{#1511  +"change": array:1 [     0 => {#1442      +"kind": "update"      +"schema": "test"      +"table": "act"      +"columnnames": array:12 [        0 => "active"        1 => "created_at"        2 => "id_created_by"        3 => "updated_at"        4 => "id_updated_by"        5 => "origin"        6 => "data"        7 => "id_brand"        8 => "id"        9 => "id_user"        10 => "gidentity"        11 => "action"      ]      +"columnvalues": array:12 [        0 => true        1 => "2019-01-28 16:54:36.64227"        2 => null        3 => "2019-01-28 16:54:36.64227"        4 => null        5 => "LEGACY"        6 => "{}"        7 => "AC"        8 => 1000000033        9 => 6        10 => "series.series..1"        11 => "test.Action.Subscribe"      ]      +"oldkeys": {#1497        +"keynames": array:2 [          0 => "id_brand"          1 => "id"        ]        +"keyvalues": array:2 [          0 => "AC"          1 => 1000000033        ] }     }  ] } Conclusion In this “chunk mode “, the output results seems unpredictable to us. What is this mode output logic we can rely on?

Thank you for your help.

eulerto commented 5 years ago

When a lot of processes are running on the database, it seems the pg_recvlogical stuck in state = startup and can not create the slot.

You don't provide enough information. Do you have big transactions? Logical decoding does not behave well with big transactions (that is why I designed --write-in-chunks). What is the error message? Could you get a stack trace?

Moreover, when we need to use the option --write-in-chunks, because pg_recvlogical can not catchup, the json is not well formated ( in fact, it is well formated few times) so this option is unstable.

The option --write-in-chunks was designed to fix a memory limit with big transactions. By default, wal2json emits the whole transaction at once and it didn't work with big transaction because PostgreSQL limits memory allocation to 1 GB (issue #4 ). If you use this option, the client needs to control the JSON start/end (because it emits pieces). I hope the new format version can cover these use cases.

maipeng commented 5 years ago

So, are you going to push a new release, for a better control of json output ? :)

eulerto commented 5 years ago

Yes, I designed another format. It fixes a lot of problems that users reported over the years.

maipeng commented 5 years ago

Nice to hear that. When will you release it ? For the PostgreSQL limits memory allocation to 1 GB, is it possible to change this limit, is it in the conf file? thx

taybin commented 5 years ago

I look forward to seeing the changes. Will both schemas be available at the same time? How will someone switch between them?

eulerto commented 5 years ago

@taybin see commit f81bf7af09324da656be87dfd53d20741c01e1e0