GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.11k stars 929 forks source link

cdc-embedded-connector giving java.lang.NullPointerException: null\n\tat com.google.cloud.dataflow.cdc.connector.DebeziumSourceRecordToDataflowCdcFormatTranslator.translate #597

Open andreagaffiero opened 1 year ago

andreagaffiero commented 1 year ago

Related Template(s)

cdc-embedded-connector

What happened?

cdc-connector has been deployed successfully and replicates cdc changes through pub/sub -> dataflow -> bigquery as needed.

I then started adding new tables to the whitelistedTables= parameter within the k8 configmap. After some time, with data changes being captured the cdc-embedded connector deployment then starts failing.

Within the logs the error placed in the log output was found.

Is there any config that can be added within the cdc-embedded-connector to fix this please? how can I get round this please? is this a known issue?

Thanks in advance.

Note: started working on this implementation last December, using DataflowTemplates-2022-12-13-00_RC01.

Beam Version

Newer than 2.43.0

Relevant log output

{"message":"io.debezium.embedded.EmbeddedEngine - Stopping connector after error in the application\u0027s handler method: null\njava.lang.NullPointerException: null\n\tat com.google.cloud.dataflow.cdc.connector.DebeziumSourceRecordToDataflowCdcFormatTranslator.translate(DebeziumSourceRecordToDataflowCdcFormatTranslator.java:91)\n\tat com.google.cloud.dataflow.cdc.connector.PubSubChangeConsumer.handleBatch(PubSubChangeConsumer.java:118)\n\tat io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:826)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n","severity":"ERROR"}

{"message":"com.google.cloud.dataflow.cdc.connector.DebeziumToPubSubDataSender - Requesting embedded engine to shut down","severity":"INFO"}
grdavor commented 6 months ago

Probably a bit late to the party but I think I was able to replicate this bug.

Setup:

databaseManagementSystem=postgres
singleTopicMode=false
whitelistedTables=postgres.public.table
inMemoryOffsetStorage=false

Steps to reproduce:

  1. Have the cdc-embedded-connector running and caught up with WAL
  2. stop the connector
  3. delete a row from the whitelistedTable
  4. restart the connector

From my limited understanding, the issue is that after restart, the connector tries to get the table schema from the after value emitted by debezium. However, if the first event it encounters is a delete event, the after value is null.

A hacky way to cricumvent the issue could be to set snapshot.mode to always (if the table is small enough so that it doesn't slow the replication too much). This would ensure that the first event the connector receives after restart isn't a delete event.

github-actions[bot] commented 6 days ago

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.