pg_recvlogical very unstable

maipeng commented 5 years ago

Hello, We use Wal2Json as a service. It begins to create a slot if not exists and then starts to listen to wal by using pg_recvlogical —start -o include-types=0 -f - When 2 or 3 long queries are running on the database, it seems the pg_recvlogical stuck in state = startup and can not create the slot. Did you ever heard of this issue ? Moreover, when we need to use the option --write-in-chunks, because pg_recvlogical can not catchup, the json is not well formated ( in fact, it is well formated few times, also no data is provided) so this option is unstable.

Invalid output format cases Case A Objective Read the data from the Wal2Json. Connection method pg_recvlogical -h host -d database --slot slot_name -U user --start -o add-tables=test.* -o include-types=0 -f -  Query UPDATE test.act SET action = action WHERE id=1000000033  Wal2Json output result We get the following output from the Wal2Json : a valid json string. """ {"change":[\n {"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gidentity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n ]}\n """

And we are able to parse it right. Case B Objective Read the data from the Wal2Json in “chunk” mode. Connection method We add -o write-in-chunks=true 2-pg_recvlogical -h host -d database --slot slot_name -U user --start -o add-tables=test.* -o include-types=0 -o write-in-chunks=true -f - Query Same query as in Case A UPDATE test.act SET action = action WHERE id=1000000033 Wal2Json output result We get 3 differents behaviours.

Resulting behaviour 1 : Nothing No data is provided.
Resulting behaviour 2 : Non-parsable / incomplete data

We get partial data, first : "{"change":[\n"  Then that : {"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gidentity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n ]}\n """

Resulting behaviour 3 : Parsable data Raw output   """ {"change":[\n {"kind":"update","schema":"test","table":"act","columnnames":["active","created_at","id_created_by","updated_at","id_updated_by","origin","data","id_brand","id","id_user","gidentity","action"],"columnvalues":[true,"2019-01-28 16:54:36.64227",null,"2019-01-28 16:54:36.64227",null,"LEGACY","{}","AC",1000000033,6,"series.series..1","test.Action.Subscribe"],"oldkeys":{"keynames":["id_brand","id"],"keyvalues":["AC",1000000033]}}\n ]}\n """ Resulting object representation : that is fine.

{#1511 +"change": array:1 [ 0 => {#1442 +"kind": "update" +"schema": "test" +"table": "act" +"columnnames": array:12 [ 0 => "active" 1 => "created_at" 2 => "id_created_by" 3 => "updated_at" 4 => "id_updated_by" 5 => "origin" 6 => "data" 7 => "id_brand" 8 => "id" 9 => "id_user" 10 => "gidentity" 11 => "action" ] +"columnvalues": array:12 [ 0 => true 1 => "2019-01-28 16:54:36.64227" 2 => null 3 => "2019-01-28 16:54:36.64227" 4 => null 5 => "LEGACY" 6 => "{}" 7 => "AC" 8 => 1000000033 9 => 6 10 => "series.series..1" 11 => "test.Action.Subscribe" ] +"oldkeys": {#1497 +"keynames": array:2 [ 0 => "id_brand" 1 => "id" ] +"keyvalues": array:2 [ 0 => "AC" 1 => 1000000033 ] } } ] } Conclusion In this “chunk mode “, the output results seems unpredictable to us. What is this mode output logic we can rely on?

Thank you for your help.

eulerto commented 5 years ago

When a lot of processes are running on the database, it seems the pg_recvlogical stuck in state = startup and can not create the slot.

You don't provide enough information. Do you have big transactions? Logical decoding does not behave well with big transactions (that is why I designed --write-in-chunks). What is the error message? Could you get a stack trace?

Moreover, when we need to use the option --write-in-chunks, because pg_recvlogical can not catchup, the json is not well formated ( in fact, it is well formated few times) so this option is unstable.

The option --write-in-chunks was designed to fix a memory limit with big transactions. By default, wal2json emits the whole transaction at once and it didn't work with big transaction because PostgreSQL limits memory allocation to 1 GB (issue #4 ). If you use this option, the client needs to control the JSON start/end (because it emits pieces). I hope the new format version can cover these use cases.

maipeng commented 5 years ago

So, are you going to push a new release, for a better control of json output ? :)

eulerto commented 5 years ago

Yes, I designed another format. It fixes a lot of problems that users reported over the years.

maipeng commented 5 years ago

Nice to hear that. When will you release it ? For the PostgreSQL limits memory allocation to 1 GB, is it possible to change this limit, is it in the conf file? thx

taybin commented 5 years ago

I look forward to seeing the changes. Will both schemas be available at the same time? How will someone switch between them?

eulerto commented 5 years ago

@taybin see commit f81bf7af09324da656be87dfd53d20741c01e1e0

eulerto / wal2json

pg_recvlogical very unstable #100