Open zdl11111 opened 11 months ago
When try to restart the job, there are other exceptions:
2023-12-12 11:23:25
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'stream_write: test1' (operator 8b0eff726c52aac1276bd5cfcb9bf178).
at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
at org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$start$0(StreamWriteOperatorCoordinator.java:191)
at org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:142)
at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20231212110524061] error
... 6 more
Caused by: org.apache.hudi.exception.HoodieException: Heartbeat for instant 20231212110524061 has expired, last heartbeat 0
at org.apache.hudi.client.heartbeat.HeartbeatUtils.abortIfHeartbeatExpired(HeartbeatUtils.java:95)
at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:225)
at org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:111)
at org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:74)
at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:199)
at org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:540)
at org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:516)
at org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:246)
at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
... 3 more
Do we have other exceptions? It looks like the exception is not a root cause, other exception relay the exception msg to interrupt these tasks.
Describe the problem you faced When I set metadata.enabled to true by Flink, HUDI cannot delta_commit successfully and always restarts the job
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version : 0.13.1
Flink version : 0.14
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no
Stacktrace