apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.33k stars 2.42k forks source link

flinksql uses hudi to write to hdfs and synchronize to hive #11862

Open biao-lvwan opened 2 weeks ago

biao-lvwan commented 2 weeks ago

env: 1,flink 1.7.1 2,hudi 0.15.0 3, hadoop 3.3.4 4,hive 3.1.3 Using the cor mode is correct, but using the mor mode causes an error when compacting. Please help.

2024-08-29 23:11:28 2024-08-29 15:11:28,630 ERROR org.apache.hudi.sink.compact.CompactOperator [] - Executor executes action [Execute compaction for instant 20240829151125857 from task 0] error 2024-08-29 23:11:28 org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9820/flink/ceshi9 is a valid table 2024-08-29 23:11:28 at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:135) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:680) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.access$100(HoodieTableMetaClient.java:81) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:772) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.table.HoodieFlinkTable.create(HoodieFlinkTable.java:61) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.client.HoodieFlinkWriteClient.getHoodieTable(HoodieFlinkWriteClient.java:463) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactOperator.reloadWriteConfig(CompactOperator.java:158) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$0(CompactOperator.java:124) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_392] 2024-08-29 23:11:28 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_392] 2024-08-29 23:11:28 at java.lang.Thread.run(Thread.java:750) [?:1.8.0_392] 2024-08-29 23:11:28 Caused by: java.io.InterruptedIOException: Interrupted waiting to send RPC request to server 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1513) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1455) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:965) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1739) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1753) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1750) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1765) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:414) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:408) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.storage.hadoop.HoodieHadoopStorage.getPathInfo(HoodieHadoopStorage.java:169) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 ... 12 more 2024-08-29 23:11:28 Caused by: java.lang.InterruptedException 2024-08-29 23:11:28 at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1213) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1508) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1455) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:965) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1739) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1753) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1750) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1765) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:414) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:408) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.storage.hadoop.HoodieHadoopStorage.getPathInfo(HoodieHadoopStorage.java:169) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 ... 12 more 2024-08-29 23:11:28 2024-08-29 15:11:28,633 INFO org.apache.hudi.common.table.HoodieTableMetaClient [] - Loading HoodieTableMetaClient from hdfs://hdfs-name-node:9820/flink/ceshi9 2024-08-29 23:11:28 2024-08-29 15:11:28,634 INFO org.apache.flink.runtime.taskmanager.Task [] - compact_task -> Sink: compact_commit (1/1)#0 (c037ded52da6b623fb862c69e540631a_97fb0facebfcc4b2c659eb3fc53740f3_0_0) switched from RUNNING to FINISHED. 2024-08-29 23:11:28 2024-08-29 15:11:28,634 INFO org.apache.flink.runtime.taskmanager.Task [] - Freeing task resources for compact_task -> Sink: compact_commit (1/1)#0 (c037ded52da6b623fb862c69e540631a_97fb0facebfcc4b2c659eb3fc53740f3_0_0). 2024-08-29 23:11:28 2024-08-29 15:11:28,634 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Un-registering task and sending final execution state FINISHED to JobManager for task compact_task -> Sink: compact_commit (1/1)#0 c037ded52da6b623fb862c69e540631a_97fb0facebfcc4b2c659eb3fc53740f3_0_0. 2024-08-29 23:11:28 2024-08-29 15:11:28,633 ERROR org.apache.flink.runtime.util.ClusterUncaughtExceptionHandler [] - WARNING: Thread 'pool-14-thread-1' produced an uncaught exception. If you want to fail on uncaught exceptions, then configure cluster.uncaught-exception-handling accordingly 2024-08-29 23:11:28 org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator 2024-08-29 23:11:28 at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:92) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.hudi.adapter.MaskingOutputAdapter.collect(MaskingOutputAdapter.java:60) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.adapter.MaskingOutputAdapter.collect(MaskingOutputAdapter.java:30) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.flink.table.runtime.util.StreamRecordCollector.collect(StreamRecordCollector.java:44) ~[flink-table-runtime-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$1(CompactOperator.java:125) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:142) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:133) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.lang.Thread.run(Thread.java:750) [?:1.8.0_392] 2024-08-29 23:11:28 Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9820/flink/ceshi9 is a valid table 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.lambda$commitIfNecessary$1(CompactionCommitSink.java:130) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at java.util.HashMap.computeIfAbsent(HashMap.java:1128) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.commitIfNecessary(CompactionCommitSink.java:125) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:114) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 ... 11 more 2024-08-29 23:11:28 Caused by: org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9820/flink/ceshi9 is a valid table 2024-08-29 23:11:28 at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:135) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:680) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.access$100(HoodieTableMetaClient.java:81) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:772) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.table.HoodieFlinkTable.create(HoodieFlinkTable.java:61) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.client.HoodieFlinkWriteClient.getHoodieTable(HoodieFlinkWriteClient.java:463) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.lambda$commitIfNecessary$1(CompactionCommitSink.java:128) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at java.util.HashMap.computeIfAbsent(HashMap.java:1128) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.commitIfNecessary(CompactionCommitSink.java:125) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:114) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 ... 11 more 2024-08-29 23:11:28 Caused by: java.io.InterruptedIOException: Interrupted waiting to send RPC request to server 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1513) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1455) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:965) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1739) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1753) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1750) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1765) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:414) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:408) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.storage.hadoop.HoodieHadoopStorage.getPathInfo(HoodieHadoopStorage.java:169) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:135) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:680) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.access$100(HoodieTableMetaClient.java:81) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:772) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.table.HoodieFlinkTable.create(HoodieFlinkTable.java:61) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.client.HoodieFlinkWriteClient.getHoodieTable(HoodieFlinkWriteClient.java:463) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.lambda$commitIfNecessary$1(CompactionCommitSink.java:128) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at java.util.HashMap.computeIfAbsent(HashMap.java:1128) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.commitIfNecessary(CompactionCommitSink.java:125) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:114) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 ... 11 more 2024-08-29 23:11:28 Caused by: java.lang.InterruptedException 2024-08-29 23:11:28 at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1213) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1508) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.Client.call(Client.java:1455) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:965) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_392] 2024-08-29 23:11:28 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source) ~[?:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1739) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1753) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1750) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1765) ~[hadoop-hdfs-client-3.3.4.jar:?] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:414) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:408) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.storage.hadoop.HoodieHadoopStorage.getPathInfo(HoodieHadoopStorage.java:169) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:135) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:680) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient.access$100(HoodieTableMetaClient.java:81) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:772) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.table.HoodieFlinkTable.create(HoodieFlinkTable.java:61) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.client.HoodieFlinkWriteClient.getHoodieTable(HoodieFlinkWriteClient.java:463) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.lambda$commitIfNecessary$1(CompactionCommitSink.java:128) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at java.util.HashMap.computeIfAbsent(HashMap.java:1128) ~[?:1.8.0_392] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.commitIfNecessary(CompactionCommitSink.java:125) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:114) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0] 2024-08-29 23:11:28 at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75) ~[flink-dist-1.17.1.jar:1.17.1] 2024-08-29 23:11:28 ... 11 more 2024-08-29 23:11:29 2024-08-29 15:11:29,534 INFO org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot TaskSlot(index:2, state:ACTIVE, resource profile: ResourceProfile{cpuCores=1, taskHeapMemory=1.141gb (1224736768 bytes), taskOffHeapMemory=0 bytes, managedMemory=16.000mb (16777216 bytes), networkMemory=256.000mb (268435456 bytes)}, allocationId: 4008d4285e8ab6ce6bde583a9f0373f7, jobId: fff078e034e31ac7adfd40774dc31dc2).

danny0405 commented 2 weeks ago

2024-08-29 23:11:28 org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9820/flink/ceshi9 is a valid table

Did you write into the table successfully?

biao-lvwan commented 2 weeks ago

2024-08-29 23:11:28 org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9820/flink/ceshi9 is a valid table

Did you write into the table successfully?

Yes, flink reads and writes normally, but synchronization in hive does not work.

biao-lvwan commented 2 weeks ago

2024-08-29 23:11:28 org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9820/flink/ceshi9 is a valid table

Did you write into the table successfully?

I lowered the version and couldn't sync using flink1.16+hudi0.13, but I used flink run -c org.apache.hudi.sink.compact.HoodieFlinkCompactor /opt/flink/lib/hudi-flink1.16-bundle- 0.13.1.jar --path s3a://ceshi/hudi9/ can be merged successfully. What is the possible reason for this?

danny0405 commented 2 weeks ago

This is not write job, it's the separate compaction job, I still believe it is related with the classloader.

biao-lvwan commented 2 weeks ago

compaction

Although this version can perform manual compaction, it still cannot automatically compactCREATE TABLE test_hudi_flink4 ( id int PRIMARY KEY NOT ENFORCED, name VARCHAR(10), price int, ts int, dtVARCHAR(10) ) PARTITIONED BY (dt) WITH ( 'connector' = 'hudi', 'path' = 'hdfs://hdfs-name-node:9820/flink/ceshi4', 'table.type' = 'COPY_ON_WRITE', 'hoodie.datasource.write.keygenerator.class' = 'org.apache.hudi.keygen.ComplexAvroKeyGenerator', 'hoodie.datasource.write.recordkey.field' = 'id', 'hoodie.datasource.write.hive_style_partitioning' = 'true', 'hive_sync.enable' = 'true', 'hive_sync.mode' = 'hms', 'hive_sync.metastore.uris' = 'thrift://hive-metastore-server:9083', 'hive_sync.conf.dir'='/opt/hive/conf', 'hive_sync.db' = 'hudi', 'hive_sync.table' = 'test_hudi_flink4', 'hive_sync.partition_fields' = 'dt', 'hive_sync.partition_extractor_class' = 'org.apache.hudi.hive.HiveStylePartitionValueExtractor' );

image same error

danny0405 commented 2 weeks ago

COW table does not need compaction, only MOR needs that.

biao-lvwan commented 2 weeks ago

COW table does not need compaction, only MOR needs that.

When synchronizing hive, set 'changelog.enabled' = 'true', 'compaction.async.enabled' = 'true', 'compaction.delta_commits' ='2', automatic compression is not possible?

danny0405 commented 2 weeks ago

You declare the table as cow, cow does not generate logs so the compaction is needless.

'table.type' = 'COPY_ON_WRITE'
biao-lvwan commented 1 week ago

You declare the table as cow, cow does not generate logs so the compaction is needless.

'table.type' = 'COPY_ON_WRITE'

Sorry, the screenshot is wrong
CREATE TABLE test_hudi_flink9 ( id int PRIMARY KEY NOT ENFORCED, name VARCHAR(10), price int, ts int, dt VARCHAR(10) ) PARTITIONED BY (dt) WITH ( 'connector' = 'hudi', 'path' = 's3a://ceshi/hudi9/', 'table.type' = 'MERGE_ON_READ', 'hoodie.datasource.write.keygenerator.class' = 'org.apache.hudi.keygen.ComplexAvroKeyGenerator', 'hoodie.datasource.write.recordkey.field' = 'id', 'hoodie.datasource.write.hive_style_partitioning' = 'true', 'changelog.enabled' = 'true', 'compaction.async.enabled' = 'true', 'compaction.delta_commits' ='2', 'compaction.trigger.strategy' = 'num_commits', 'hive_sync.enable'='true', 'hive_sync.table'='t_hdm', 'hive_sync.db'='default', 'hive_sync.mode' = 'hms', 'hive_sync.metastore.uris' = 'thrift://hive-metastore:9083' );

I use MERGE_ON_READ,Whether using s3 or minio ,Will not be automatically compressed and merged into storage,What may be the cause? image

danny0405 commented 1 week ago

you can check the compaction scheduling log in the JM log file, if the plan exists, maybe there are some parquet write errors.

biao-lvwan commented 1 week ago

you can check the compaction scheduling log in the JM log file, if the plan exists, maybe there are some parquet write errors. I have tried many versions. Both the cow and mor modes of flinksql run normally, but an error occurs during compaction. The log is as follows. What may be the cause of this? I see that there is no error in JM. The error is in tm. jm.txt tm.txt

danny0405 commented 1 week ago

The table seems invalid:

 Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9000/user/hive/warehouse/demo3.db is a valid table
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.lambda$commitIfNecessary$1(CompactionCommitSink.java:130) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at java.util.HashMap.computeIfAbsent(HashMap.java:1128) ~[?:1.8.0_392]
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.commitIfNecessary(CompactionCommitSink.java:125) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:114) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist-1.17.1.jar:1.17.1]
2024-09-02 15:26:44     at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75) ~[flink-dist-1.17.1.jar:1.17.1]
2024-09-02 15:26:44     ... 11 more
2024-09-02 15:26:44 Caused by: org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9000/user/hive/warehouse/demo3.db is a valid table
biao-lvwan commented 1 week ago

The table seems invalid:

 Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9000/user/hive/warehouse/demo3.db is a valid table
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.lambda$commitIfNecessary$1(CompactionCommitSink.java:130) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at java.util.HashMap.computeIfAbsent(HashMap.java:1128) ~[?:1.8.0_392]
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.commitIfNecessary(CompactionCommitSink.java:125) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:114) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at org.apache.hudi.sink.compact.CompactionCommitSink.invoke(CompactionCommitSink.java:59) ~[hudi-flink1.17-bundle-0.15.0.jar:0.15.0]
2024-09-02 15:26:44     at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist-1.17.1.jar:1.17.1]
2024-09-02 15:26:44     at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75) ~[flink-dist-1.17.1.jar:1.17.1]
2024-09-02 15:26:44     ... 11 more
2024-09-02 15:26:44 Caused by: org.apache.hudi.exception.HoodieIOException: Could not check if hdfs://hdfs-name-node:9000/user/hive/warehouse/demo3.db is a valid table

Yes, I noticed this, but this error only occurs during compaction. When querying flinksql, there is no error. image

danny0405 commented 1 week ago

is hdfs://hdfs-name-node:9000/user/hive/warehouse/demo3.db a valid table path?

biao-lvwan commented 1 week ago

is hdfs://hdfs-name-node:9000/user/hive/warehouse/demo3.db a valid table path? Yes, this is a valid path, image image