DataLinkDC / dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
http://www.dinky.org.cn
Apache License 2.0
3.15k stars 1.16k forks source link

[Bug] [oracle整库同步] oracle->starrocks整库同步flink报错 #1650

Closed radioliu92 closed 1 year ago

radioliu92 commented 1 year ago

Search before asking

What happened

dinky:0.7.1 starrocks:2.4.3 flink:1.5.3 cdc:2.3

场景:

  1. oracle->strarocks的单表同步已经调试没问题
  2. 在整库同步的时候 flink 报错, dinky 的日志没有报错 oracle_整库同步测试01

同步sql: EXECUTE CDCSOURCE jobname2 WITH ( 'connector' = 'oracle-cdc', 'hostname' = '*', 'port' = '1521', 'username' = 'flink', 'password'='****', 'checkpoint' = '12000', 'scan.startup.mode' = 'initial', 'parallelism' = '1', 'database-name' = 'ZZMESDB', --'schema-name' = 'TEST', 'table-name' = 'TEST.FLINK_TEST03', 'debezium.log.mining.strategy' = 'online_catalog', 'sink.connector' = 'starrocks', 'sink.jdbc-url' = 'jdbc:mysql://*:2030', 'sink.load-url' = ':1030', 'sink.username' = 'root', 'sink.password' = '', 'sink.sink.db' = 'flink_test', 'sink.table.lower' = 'true', 'sink.database-name' = 'flink_test', 'sink.table-name' = '${tableName}', 'sink.sink.properties.format' = 'json', 'sink.sink.properties.strip_outer_array' = 'true', 'sink.sink.max-retries' = '10', 'sink.sink.buffer-flush.interval-ms' = '15000', 'sink.sink.parallelism' = '1' )

What you expected to happen

flink的报错: 2023-02-15 08:41:40 com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped. at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42) at io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:325) at io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:71) at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:160) at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:122) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.SchemaBuilderException: Invalid default value at com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:131) at io.debezium.relational.TableSchemaBuilder.addField(TableSchemaBuilder.java:374) at io.debezium.relational.TableSchemaBuilder.lambda$create$2(TableSchemaBuilder.java:119) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at io.debezium.relational.TableSchemaBuilder.create(TableSchemaBuilder.java:117) at io.debezium.relational.RelationalDatabaseSchema.buildAndRegisterSchema(RelationalDatabaseSchema.java:130) at io.debezium.connector.oracle.OracleDatabaseSchema.lambda$applySchemaChange$0(OracleDatabaseSchema.java:73) at java.lang.Iterable.forEach(Iterable.java:75) at io.debezium.connector.oracle.OracleDatabaseSchema.applySchemaChange(OracleDatabaseSchema.java:72) at io.debezium.pipeline.EventDispatcher$SchemaChangeEventReceiver.schemaChangeEvent(EventDispatcher.java:522) at io.debezium.connector.oracle.OracleSchemaChangeEventEmitter.emitSchemaChangeEvent(OracleSchemaChangeEventEmitter.java:113) at io.debezium.pipeline.EventDispatcher.dispatchSchemaChangeEvent(EventDispatcher.java:297) at io.debezium.connector.oracle.logminer.LogMinerQueryResultProcessor.dispatchSchemaChangeEventAndGetTableForNewCapturedTable(LogMinerQueryResultProcessor.java:336) at io.debezium.connector.oracle.logminer.LogMinerQueryResultProcessor.getTableForDmlEvent(LogMinerQueryResultProcessor.java:323) at io.debezium.connector.oracle.logminer.LogMinerQueryResultProcessor.processResult(LogMinerQueryResultProcessor.java:257) at io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:280) ... 8 more Caused by: com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type INT8: class java.lang.String for field: "null" at com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:245) at com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:213) at com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:129) ... 31 more

How to reproduce

dinky日志: [dlink] 2023-02-15 08:38:44 CST INFO com.dlink.executor.Executor 274 loginFromKeytabIfNeed - Simple authentication mode [dlink] 2023-02-15 08:38:44 CST INFO com.dlink.executor.Executor 274 loginFromKeytabIfNeed - Simple authentication mode [dlink] 2023-02-15 08:38:44 CST INFO com.dlink.executor.Executor 274 loginFromKeytabIfNeed - Simple authentication mode [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.trans.ddl.CreateCDCSourceOperation 78 build - Start build CDCSOURCE Task... [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.trans.ddl.CreateCDCSourceOperation 165 build - A total of 0 tables were detected... [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.trans.ddl.CreateCDCSourceOperation 174 build - Set parallelism: 1 [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.trans.ddl.CreateCDCSourceOperation 178 build - Set checkpoint: 12000 [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.trans.ddl.CreateCDCSourceOperation 181 build - Build oracle-cdc successful... [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.cdc.sql.SQLSinkBuilder 220 build - Build deserialize successful... [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.cdc.sql.SQLSinkBuilder 277 build - A total of 0 table cdc sync were build successfull... [dlink] 2023-02-15 08:39:15 CST INFO com.dlink.trans.ddl.CreateCDCSourceOperation 190 build - Build CDCSOURCE Task successful! [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: jobmanager.rpc.address, localhost [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: jobmanager.rpc.port, 6123 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: jobmanager.bind-host, 0.0.0.0 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: jobmanager.memory.process.size, 1600m [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: taskmanager.bind-host, localhost [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: taskmanager.host, localhost [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: taskmanager.memory.process.size, 1728m [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: taskmanager.numberOfTaskSlots, 1 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: parallelism.default, 1 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: high-availability, zookeeper [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: high-availability.storageDir, hdfs:///flink/ha/ [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: high-availability.zookeeper.quorum, tbddn1:2181,tbddn2:2181,tbddn3:2181 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: state.checkpoints.dir, hdfs:///flink/checkpoints [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: state.savepoints.dir, hdfs:///flink/savepoints [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: jobmanager.execution.failover-strategy, region [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: rest.port, 8085 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: rest.address, localhost [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: rest.bind-address, 0.0.0.0 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.configuration.GlobalConfiguration 213 loadYAMLResource - Loading configuration property: classloader.check-leaked-classloader, false [dlink] 2023-02-15 08:39:15 CST WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil 73 discoverLogConfigFile - The configuration directory ('/usr/local/flink/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file. [dlink] 2023-02-15 08:39:15 CST INFO org.apache.hadoop.yarn.client.RMProxy 133 newProxyInstance - Connecting to ResourceManager at TBDCM1/10.10.20.86:8032 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 208 getLocalFlinkDistPath - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar [dlink] 2023-02-15 08:39:15 CST WARN org.apache.flink.yarn.YarnClusterDescriptor 481 deployJobCluster - Job Clusters are deprecated since Flink 1.15. Please use an Application Cluster/Application Mode instead. [dlink] 2023-02-15 08:39:15 CST WARN org.apache.flink.yarn.YarnClusterDescriptor 351 isReadyForDeployment - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 728 logIfComponentMemNotIntegerMultipleOfYarnMinAllocation - The configured JobManager memory is 1600 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 448 MB may not be used by Flink. [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 728 logIfComponentMemNotIntegerMultipleOfYarnMinAllocation - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink. [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 605 deployInternal - Cluster specification: ClusterSpecification{masterMemoryMB=1600, taskManagerMemoryMB=1728, slotsPerTaskManager=1} [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 1300 lambda$removeLocalhostBindHostSetting$9 - Removing 'localhost' Key: 'taskmanager.bind-host' , default: null (fallback keys: []) setting from effective configuration; using '0.0.0.0' instead. [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils 330 capToMinMax - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 1239 startAppMaster - Submitting application master application_1676339038783_0005 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl 348 submitApplication - Submitted application application_1676339038783_0005 [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 1242 startAppMaster - Waiting for the cluster to be allocated [dlink] 2023-02-15 08:39:15 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 1277 startAppMaster - Deploying cluster, current state ACCEPTED [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 1270 startAppMaster - YARN application has been deployed successfully. [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 1866 logDetachedClusterInformation - The Flink YARN session cluster has been started in detached mode. In order to stop Flink gracefully, use the following command: $ echo "stop" | ./bin/yarn-session.sh -id application_1676339038783_0005 If this should not be possible, then you can also kill Flink via YARN's web interface or via: $ yarn application -kill application_1676339038783_0005 Note that killing Flink might not clean up all job artifacts and temporary files. [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.yarn.YarnClusterDescriptor 1843 setClusterEntrypointInfoToConfig - Found Web Interface tbdcm1:8085 of application 'application_1676339038783_0005'. [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.runtime.util.ZooKeeperUtils 251 startCuratorFramework - Enforcing default ACL for ZK connections [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.runtime.util.ZooKeeperUtils 257 startCuratorFramework - Using '/flink/application_1676339038783_0005' as Zookeeper namespace. [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl 338 start - Starting [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper 868 - Initiating client connection, connectString=tbddn1:2181,tbddn2:2181,tbddn3:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.curator5.org.apache.curator.ConnectionState@4ce3ef08 [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocket 237 initProperties - jute.maxbuffer value is 4194304 Bytes [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn 1653 initRequestTimeout - zookeeper.request.timeout value is 0. feature enabled= [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread 1112 logStartConnect - Opening socket connection to server tbddn3/10.10.20.89:2181. Will not attempt to authenticate using SASL (unknown error) [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl 386 start - Default schema [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread 959 primeConnection - Socket connection established, initiating session, client: /10.10.20.86:56032, server: tbddn3/10.10.20.89:2181 [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService 98 start - Starting DefaultLeaderRetrievalService with ZookeeperLeaderRetrievalDriver{connectionInformationPath='/leader/rest_server/connection_info'}. [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread 1394 onConnected - Session establishment complete on server tbddn3/10.10.20.89:2181, sessionid = 0x500891c0f9f0006, negotiated timeout = 40000 [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager 250 postState - State change: CONNECTED [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker 201 processConfigData - New config event received: {server.1=TBDCM1:2888:3888:participant, version=0, server.5=TBDDN3:2888:3888:participant, server.4=TBDDN2:2888:3888:participant, server.3=TBDDN1:2888:3888:participant, server.2=TBDCM2:2888:3888:participant} [dlink] 2023-02-15 08:39:20 CST INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker 201 processConfigData - New config event received: {server.1=TBDCM1:2888:3888:participant, version=0, server.5=TBDDN3:2888:3888:participant, server.4=TBDDN2:2888:3888:participant, server.3=TBDDN1:2888:3888:participant, server.2=TBDCM2:2888:3888:participant}

Anything else

No response

Version

0.7.0

Are you willing to submit PR?

Code of Conduct

aiwenmo commented 1 year ago

可能特殊字段类型转换有问题

radioliu92 commented 1 year ago

可能特殊字段类型转换有问题

没有, 我试了, 我就一个字段 varchar2类型, 也是会报这个

aiwenmo commented 1 year ago

'table-name' = 'TEST\.FLINK_TEST03',

radioliu92 commented 1 year ago

'table-name' = 'TEST.FLINK_TEST03',

复制到这里, 他转了

radioliu92 commented 1 year ago

'table-name' = 'TEST.FLINK_TEST03',

你看, 我回复你他就转了

aiwenmo commented 1 year ago

[dlink] 2023-02-15 08:39:15 CST INFO com.dlink.trans.ddl.CreateCDCSourceOperation 165 build - A total of 0 tables were detected... Oracle元数据没有获取到表,可在IDEA里断点调试看看为什么没有表

aiwenmo commented 1 year ago