Open andreagaffiero opened 1 year ago
Probably a bit late to the party but I think I was able to replicate this bug.
Setup:
databaseManagementSystem=postgres
singleTopicMode=false
whitelistedTables=postgres.public.table
inMemoryOffsetStorage=false
Steps to reproduce:
From my limited understanding, the issue is that after restart, the connector tries to get the table schema from the after
value emitted by debezium. However, if the first event it encounters is a delete event, the after value is null.
A hacky way to cricumvent the issue could be to set snapshot.mode
to always
(if the table is small enough so that it doesn't slow the replication too much). This would ensure that the first event the connector receives after restart isn't a delete event.
This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.
Related Template(s)
cdc-embedded-connector
What happened?
cdc-connector has been deployed successfully and replicates cdc changes through pub/sub -> dataflow -> bigquery as needed.
I then started adding new tables to the whitelistedTables= parameter within the k8 configmap. After some time, with data changes being captured the cdc-embedded connector deployment then starts failing.
Within the logs the error placed in the log output was found.
Is there any config that can be added within the cdc-embedded-connector to fix this please? how can I get round this please? is this a known issue?
Thanks in advance.
Note: started working on this implementation last December, using DataflowTemplates-2022-12-13-00_RC01.
Beam Version
Newer than 2.43.0
Relevant log output