GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.15k stars 971 forks source link

Issue while creating Docker Image for cdc-embedded-connector #337

Closed shreekarwaghela closed 4 months ago

shreekarwaghela commented 2 years ago

Related Template(s)

cdc-parent

What happened?

Hi all,

We want to create a streaming pipeline which will read the changes from PostgreSQL and reflect the changes in our data warehouse table in BigQuery in real time. For this purpose we are trying to use this cdc-embedded-connector solution.

Implementation: We have done a POC for this where we followed the steps provided in the open source documentation. In this case, we used a Cloud SQL instance with PostgreSQL database as source and deployed the cdc-connector solution from source code, in Google Compute Engine instance. Here, cdc-connector is able to detect any changes done in the Cloud SQL database tables and reflect that to the PubSub topics. In the second part of the solution, we deployed the DataFlow pipeline to read changes from PubSub topics and reflect the changes to our BigQuery tables. During the POC, the solution worked perfectly and we were able to get the changes from PostgreSQL table to BigQuery table in real time.

Command used for POC: mvn -Ppostgres exec:java -pl cdc-embedded-connector \ -Dexec.args="path/to/your/properties/file.properties [path/to/password/file.properties]"

Next we want to deploy this cdc-connector solution in Kubernetes. For this purpose, we followed the steps in the documentation to create a Docker image for the cdc-connector and used following command: mvn compile -pl cdc-embedded-connector jib:dockerBuild -Ppostgres

But this steps fails, sharing the log for this.

Beam Version

2.31.0

Relevant log output

[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Detecting the operating system and CPU architecture
[INFO] ------------------------------------------------------------------------
[INFO] os.detected.name: osx
[INFO] os.detected.arch: x86_64
[INFO] os.detected.version: 10.15
[INFO] os.detected.version.major: 10
[INFO] os.detected.version.minor: 15
[INFO] os.detected.classifier: osx-x86_64
[INFO] 
[INFO] --------< com.google.cloud.dataflow.cdc:cdc-embedded-connector >--------
[INFO] Building cdc-embedded-connector 0.1
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from confluent: https://packages.confluent.io/maven/com/google/cloud/teleport/v2/dynamic-templates/1.0-SNAPSHOT/maven-metadata.xml
Downloading from pentaho: https://public.nexus.pentaho.org/repository/proxy-public-3rd-party-release/com/google/cloud/teleport/v2/dynamic-templates/1.0-SNAPSHOT/maven-metadata.xml
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce) @ cdc-embedded-connector ---
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce-banned-dependencies) @ cdc-embedded-connector ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ cdc-embedded-connector ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 4 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.2:compile (default-compile) @ cdc-embedded-connector ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 5 source files to /Users/shreekar.waghela/Desktop/DataflowTemplates/v2/cdc-parent/cdc-embedded-connector/target/classes
[INFO] /Users/shreekar.waghela/Desktop/DataflowTemplates/v2/cdc-parent/cdc-embedded-connector/src/main/java/com/google/cloud/dataflow/cdc/connector/DebeziumSourceRecordToDataflowCdcFormatTranslator.java: /Users/shreekar.waghela/Desktop/DataflowTemplates/v2/cdc-parent/cdc-embedded-connector/src/main/java/com/google/cloud/dataflow/cdc/connector/DebeziumSourceRecordToDataflowCdcFormatTranslator.java uses or overrides a deprecated API.
[INFO] /Users/shreekar.waghela/Desktop/DataflowTemplates/v2/cdc-parent/cdc-embedded-connector/src/main/java/com/google/cloud/dataflow/cdc/connector/DebeziumSourceRecordToDataflowCdcFormatTranslator.java: Recompile with -Xlint:deprecation for details.
[INFO] 
[INFO] --- jib-maven-plugin:1.5.1:dockerBuild (default-cli) @ cdc-embedded-connector ---
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  27.262 s
[INFO] Finished at: 2022-01-28T17:45:03+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.google.cloud.tools:jib-maven-plugin:1.5.1:dockerBuild (default-cli) on project cdc-embedded-connector: Execution default-cli of goal com.google.cloud.tools:jib-maven-plugin:1.5.1:dockerBuild failed: environment map contains null values -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
flayman commented 2 years ago

I've got the same problem. I thought I was doing something wrong, but the dependency for dynamic templates simply isn't there. Maven is trying to download https://public.nexus.pentaho.org/repository/proxy-public-3rd-party-release/com/google/cloud/teleport/v2/dynamic-templates/1.0-SNAPSHOT/dynamic-templates-1.0-SNAPSHOT.pom, but that does not exist and there is no teleport directory. You expect the master branch to compile. It makes me wonder whether anyone is actually building this and why I should bother, which is a shame. I suppose it's not this project's fault that a dependency has gone missing if it used to be somewhere. But where is it?

picassio commented 2 years ago

I've got the same problem with the new main branch. After I switch to revert-257-master branch, I can successfully build the image.

grdavor commented 2 years ago

I've just finished setting up the cdc-embedded-connector in our k8s cluster and we've had the same problem when building the docker image. Commenting out the DATAFLOW_JAVA_COMMAND_SPEC element inside v2/pom.xml got the build to complete, but the container still wouldn't run.

Commenting out the entrypoint element in v2/pom.xml as well as appending <mainClass>com.google.cloud.dataflow.cdc.connector.App</mainClass> after line 65 in pom of cdc-embedded-connector seems to have done the trick for us.

Not sure what the intended way of building the docker image is supposed to be. Maybe you're meant to add a spec.json file somewhere, but the README doesn't mention this.

jerrt2003 commented 10 months ago

Hi, just wondering if there is any update on this ticket as I'm trying to follow the MD to build a connector image but fail. Even follow @grdavor trick (e.g. commenting out the entry point..etc), the container won't start. Got any suggestion? Thanks

github-actions[bot] commented 4 months ago

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.

github-actions[bot] commented 4 months ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.