apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.33k stars 2.42k forks source link

Enable Hudi Metadata Table and Multi-Modal Index bug #9672

Open MorningGlow opened 1 year ago

MorningGlow commented 1 year ago

use version: flink 1.15.2 hudi 0.12.3 hive 3.1.2 hadoop 3.2.4 CREATE TABLE WF_UNITINFOTRAVEL_HUDI ( ID STRING ,WORKORDER STRING COMMENT '工单' ,UNITID STRING COMMENT '序号' ,PARTID STRING COMMENT '成品料号ID' ,PARTNAME STRING ,ROUTEID STRING COMMENT '流程ID' ,ROUTENAME STRING ,LINEID STRING COMMENT '线别ID' ,LINENAME STRING ,CURPROCESSID STRING COMMENT '当前制程ID' ,CURPROCESSNAME STRING ,EQPID STRING COMMENT '工作站(设备ID)' ,NEXTPROCESSID STRING COMMENT 'backing,input' ,NEXTPROCESSNAME STRING ,CURRENTSTATUS STRING COMMENT '当前状态(Pss,Fail,Scrap)' ,INPROCESSTIME STRING COMMENT '进入制程时间' ,OUTPROCESSTIME STRING COMMENT '离开制程时间' ,INPDLINETIME STRING COMMENT '进入生产线时间' ,OUTPDLINETIME STRING COMMENT '离开生产线时间' ,PALLETNO STRING COMMENT '栈板号' ,CONTAINER STRING COMMENT '货柜' ,QCNO STRING COMMENT '抽验编号' ,QCRESULT STRING COMMENT '抽验结果' ,REWORKWO STRING COMMENT '重工号' ,BOXNO STRING COMMENT '包装盒/袋子' ,PANELNO STRING COMMENT '大板序号' ,BOARDNO DECIMAL(10,0) COMMENT '穴位' ,CARRIER STRING ,COVER STRING ,BASE STRING ,MAGAZINE STRING ,ACTION STRING COMMENT '操作动作:如 InDryBox、OutDryBox、InOvenBox、OutOvenBox' ,OPT2 STRING ,OPT3 STRING ,OPT4 STRING ,OPT5 STRING ,SORTCODE DECIMAL(20,0) ,ENABLED DECIMAL(1,0) ,CREATEDATE STRING ,CREATEUSERID STRING ,CREATEUSERNAME STRING ,MODIFYDATE STRING ,MODIFYUSERID STRING ,MODIFYUSERNAME STRING ,REMARK STRING ,RULENAME STRING ,CARTONNO STRING COMMENT '箱号' ,XFLAG STRING COMMENT 'X板标识:OK/NG' ,PASSCOUNT DECIMAL(10,0) COMMENT '当前制程过站成功数量' ,SPLITFLAG STRING COMMENT '分板标识:Y/N' ,LEDBIN STRING ,PROCESSGRADE STRING ,HOLDREASON STRING ,UNITID56 STRING ,ELAPSEDMILLISECONDS DECIMAL(19,0) COMMENT '耗时(ms)' ,BATCHID STRING COMMENT '批次ID' ,DT STRING ,primary key (ID) not enforced) PARTITIONED BY (DT) with( -- 'read.streaming.enabled' = 'true', 'path' = 'hdfs://ks2p-hadoop01:9000/data/hive/warehouse/test.db/TEST_WF_UNITINFOTRAVEL', 'hoodie.parquet.small.file.limit' = '125829120', 'hoodie.parquet.max.file.size' = '134217728', 'hive_sync.enable' = 'true', 'connector' = 'hudi', 'read.streaming.check-interval' = '3', 'hive_sync.metastore.uris' = 'thrift://ks2p-hadoop01:9083', 'hive_sync.table' = 'TEST_WF_UNITINFOTRAVEL', 'hive_sync.db' = 'demo', 'compaction.trigger.strategy' = 'num_commits', 'changelog.enabled' = 'true', 'write.rate.limit' = '90000', 'hive_sync.support_timestamp' = 'true', 'compaction.async.enabled' = 'true', 'write.operation' = 'upsert', 'hoodie.datasource.write.recordkey.field' = 'ID', 'hoodie.datasource.write.precombine.field' = 'MODIFYDATE', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'hive_sync.mode' = 'hms', 'hoodie.metadata.index.bloom.filter.enable'='true', 'hoodie.metadata.enable'='true', 'hoodie.metadata.index.column.stats.enable'='true', 'hoodie.enable.data.skipping'='true', 'hoodie.metadata.compact.max.delta.commits'='2', 'hoodie.metadata.index.column.stats.file.group.count'='8', 'hoodie.write.concurrency.mode'='optimistic_concurrency_control', 'hoodie.cleaner.policy.failed.writes'='LAZY', 'hoodie.write.lock.provider'='org.apache.hudi.client.transaction.lock.InProcessLockProvider' -- 'hoodie.metadata.index.column.stats.column.list'='ID,WORKORDER' );

error

java.io.FileNotFoundException: File does not exist: /data/hive/warehouse/test.db/TEST_WF_UNITINFOTRAVEL/.hoodie/metadata/column_stats/.hoodie_partition_metadata_0 (inode 44976335) [Lease.  Holder: DFSClient_NONMAPREDUCE_-1936765705_74, pending creates: 1]
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2929)
    at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
    at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2808)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:910)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2960)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_261]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_261]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_261]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_261]
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1842) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1638) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist: /data/hive/warehouse/test.db/TEST_WF_UNITINFOTRAVEL/.hoodie/metadata/column_stats/.hoodie_partition_metadata_0 (inode 44976335) [Lease.  Holder: DFSClient_NONMAPREDUCE_-1936765705_74, pending creates: 1]
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2929)
    at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
    at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2808)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:910)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:577)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2960)

    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.ipc.Client.call(Client.java:1435) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.ipc.Client.call(Client.java:1345) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at com.sun.proxy.$Proxy40.addBlock(Unknown Source) ~[?:?]
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:444) ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
    at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) ~[?:?]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]

113211694397698_ pic 113221694397698_ pic

113241694397742_ pic

If possible, could you please provide a solution first? Thank you

danny0405 commented 1 year ago

CC @codope to take a look at this.

ad1happy2go commented 1 year ago

@MorningGlow Can you post the timeline please.

MorningGlow commented 1 year ago

@ad1happy2go I have now switched to version 0.13.1, and I remember reporting an error during the first merge ,May I ask if there is a problem with my configuration?

org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'stream_write: WF_UNITINFOTRAVEL_HUDI' (operator 3be69e0bbe7ef4739ffdb41eadc976f5).
    at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556)
    at org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$start$0(StreamWriteOperatorCoordinator.java:191)
    at org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:142)
    at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:133)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20230911193845109] error
    ... 6 more
Caused by: org.apache.hudi.exception.HoodieException: Failed to update metadata
    at org.apache.hudi.client.HoodieFlinkTableServiceClient.writeTableMetadata(HoodieFlinkTableServiceClient.java:184)
    at org.apache.hudi.client.HoodieFlinkWriteClient.writeTableMetadata(HoodieFlinkWriteClient.java:279)
    at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:282)
    at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:233)
    at org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:111)
    at org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:74)
    at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:199)
    at org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:540)
    at org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:516)
    at org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:246)
    at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
    ... 3 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upsetting bucketType UPDATE for partition :files
    at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.handleUpsertPartition(BaseFlinkCommitActionExecutor.java:203)
    at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:107)
    at org.apache.hudi.table.action.commit.delta.FlinkUpsertPreppedDeltaCommitActionExecutor.execute(FlinkUpsertPreppedDeltaCommitActionExecutor.java:52)
    at org.apache.hudi.table.HoodieFlinkMergeOnReadTable.upsertPrepped(HoodieFlinkMergeOnReadTable.java:81)
    at org.apache.hudi.client.HoodieFlinkWriteClient.lambda$upsertPreppedRecords$4(HoodieFlinkWriteClient.java:167)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
    at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
    at java.util.stream.AbstractTask.compute(AbstractTask.java:316)
    at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)
Caused by: java.lang.ExceptionInInitializerError
    at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:142)
    at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:115)
    at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:159)
    at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:464)
    at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:437)
    at org.apache.hudi.table.action.commit.delta.BaseFlinkDeltaCommitActionExecutor.handleUpdate(BaseFlinkDeltaCommitActionExecutor.java:54)
    at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.handleUpsertPartition(BaseFlinkCommitActionExecutor.java:195)
    ... 16 more
Caused by: java.lang.RuntimeException: Could not create  interface org.apache.hudi.org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactory Is the hadoop compatibility jar on the classpath?
    at org.apache.hudi.org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:75)
    at org.apache.hudi.org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:32)
    at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:176)
    ... 23 more
Caused by: java.util.NoSuchElementException
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:365)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at org.apache.hudi.org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:61)
    ... 25 more
企业微信截图_37195ea1-e328-4c8d-9b80-8446d4cc8d21 企业微信截图_f3335208-7cbe-4678-b2c5-7cebe3501083
MorningGlow commented 1 year ago

0.13.1 Also failed

ad1happy2go commented 1 year ago

@MorningGlow Configurations looks okay, I guess you are facing this issue due to stale column stats I guess. Probably a bug. Can you provide steps to reproduce.

MorningGlow commented 1 year ago

@ad1happy2go ad1happy2go Actually, after the OGG data enters Kafka, it is written to Hudi. Every time I test, I will clear all the data on the HDFS

codope commented 1 year ago

@MorningGlow The latest error stacktrace (version 0.13.1) suggests that while writing metadata table, HFile writer instantiation failed due to incompatible class in the classpath. My guess is this should still happen after disabling column stats or bloom filter. Can you disabling other metadata configs and just keep hoodie.metadata.enable to true? This will help us narrow down the issue.

ad1happy2go commented 1 year ago

@MorningGlow Did you got a chance to try as @codope suggested?

nsivabalan commented 5 months ago

hey @MorningGlow : any follow ups on this?