GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.16k stars 977 forks source link

[Bug]: DataStream To BigQuery Template - Not working with Postgres #2025

Open SargisPlusPlus opened 2 days ago

SargisPlusPlus commented 2 days ago

Related Template(s)

datastream-to-bigquery

Template Version

latest / 2024-11-06-01_rc00

What happened?

I have been following this Google Guide to export data from Postgres to BigQuery. The guide mentions that Postgres is a supported DB for this guide.

I followed the guide and as per guide used the template Datastream to BigQuery Template to setup datastream to bigquery.

After my setup was complete, there are two BQ datasets I set up: Staging and a real one. I noticed that staging dataset is getting populated with data. Unfortunately, real data set does not get any data. Both datasets correctly created tables and columns. But both also had additional columns

  "_metadata_timestamp",
  "_metadata_read_timestamp",
  "_metadata_read_method",
  "_metadata_source_type",
  "_metadata_deleted",
  "_metadata_change_type",
  "_metadata_row_id",
  "_metadata_scn",
  "_metadata_ssn",
  "_metadata_rs_id"

Looking at logs, I see merge step failed for every table:

Merge Job Failed With BigQuery Exception: com.google.cloud.bigquery.BigQueryException: Query error: Unrecognized name: _metadata_lsn; Did you mean _metadata_ssn? at [1:627] Statement: BEGIN BEGIN TRANSACTION; MERGE .....

Once I manually added _metadata_lsn and _metadata_uuid to all tables in all datasets, the merge is succeeding. But this approach seems wrong.

Why isn't _metadata_lsn auto-created? and _metadata_scn is a Oracle column. Why does the data flow create Oracle columns?

SargisPlusPlus commented 2 days ago

Might be related to: https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/610