datavane / tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
https://tis.pub
Apache License 2.0
1.04k stars 222 forks source link

mysql实时同步clickhouse的问题 #250

Open Berzrk opened 1 year ago

Berzrk commented 1 year ago

日志


org.apache.flink.runtime.JobException: The failure is not recoverable
        at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:119) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:207) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:197) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:188) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:677) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:435) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_281]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_281]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_281]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_281]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) ~[flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-tis-1.13.1.jar:tis-1.13.1]
Caused by: com.dtstack.chunjun.throwable.NoRestartException: The dirty consumer shutdown, due to the consumed count exceed the max-consumed [0]
        at com.dtstack.chunjun.dirty.consumer.DirtyDataCollector.addConsumed(DirtyDataCollector.java:105) ~[?:?]
        at com.dtstack.chunjun.dirty.consumer.DirtyDataCollector.offer(DirtyDataCollector.java:79) ~[?:?]
        at com.dtstack.chunjun.dirty.manager.DirtyManager.collect(DirtyManager.java:140) ~[?:?]
        at com.dtstack.chunjun.sink.format.BaseRichOutputFormat.writeSingleRecord(BaseRichOutputFormat.java:486) ~[?:?]
        at com.dtstack.chunjun.sink.format.BaseRichOutputFormat.lambda$writeRecordInternal$2(BaseRichOutputFormat.java:504) ~[?:?]
        at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_281]
Berzrk commented 1 year ago

如果需要提供什么数据,后续我会贴出,情况就是:我启动flink之后,配置实时同步任务,测试单个记录的增删改,增改没问题,但是删不了,后续运行了一大时间之后莫名就出现这个问题,监控页面也显示“服务端获取不到该Job状态信息,可能是因为Flink-Cluster重启导致”,目前的日志无法判断到底是什么问题导致的,请求大佬帮忙解决

baisui1981 commented 1 year ago

目前调用的 chunjun的 clickhouse sink 组件 确实还没有支持物理删除,后续版本会支持物理删除的 https://github.com/datavane/tis/issues/209

Berzrk commented 1 year ago

除了删除这个,还有就是flink_tis实时同步会有报错信息,就是上面的报错日志,这个问题我不确定是什么导致,因为是运行了一段时间之后就会出现的问题,tis的监控平台 实时同步也会出现"服务端获取不到该Job状态信息,可能是因为Flink-Cluster重启导致,请手动 ",但是查询flink_tis还是运行状态,但是已经无法实时同步了

baisui1981 commented 1 year ago

除了删除这个,还有就是flink_tis实时同步会有报错信息,就是上面的报错日志,这个问题我不确定是什么导致,因为是运行了一段时间之后就会出现的问题,tis的监控平台 实时同步也会出现"服务端获取不到该Job状态信息,可能是因为Flink-Cluster重启导致,请手动 ",但是查询flink_tis还是运行状态,但是已经无法实时同步了

上面的这个错误日志,不是直接导致错误的异常信息,你看到 以下这种错误信息要把,日志往上拉一下,真正的错误的信息在上面的位置

Caused by: com.dtstack.chunjun.throwable.NoRestartException: The dirty consumer shutdown, due to the consumed count exceed the max-consumed [0]