carbonlake / forum

大数据+AI讨论组
2 stars 1 forks source link

先创建datamap,在加载数据抛出以下异常 #8

Open a394023466 opened 4 years ago

a394023466 commented 4 years ago
   at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.ExecutionException: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:143)
    ... 12 more

Caused by: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.carbondata.processing.datamap.DataMapWriterListener.register(DataMapWriterListener.java:107)
    at org.apache.carbondata.processing.datamap.DataMapWriterListener.registerAllWriter(DataMapWriterListener.java:82)
    at org.apache.carbondata.processing.store.CarbonFactDataHandlerModel.createCarbonFactDataHandlerModel(CarbonFactDataHandlerModel.java:301)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:163)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.access$000(CarbonRowDataWriterProcessorStepImpl.java:57)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl$DataWriterRunnable.run(CarbonRowDataWriterProcessorStepImpl.java:331)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more

Caused by: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.carbondata.datamap.bloom.AbstractBloomDataMapWriter.initDataMapFile(AbstractBloomDataMapWriter.java:172)
    at org.apache.carbondata.datamap.bloom.AbstractBloomDataMapWriter.<init>(AbstractBloomDataMapWriter.java:63)
    at org.apache.carbondata.datamap.bloom.BloomDataMapWriter.<init>(BloomDataMapWriter.java:55)
    at org.apache.carbondata.datamap.bloom.BloomCoarseGrainDataMapFactory.createWriter(BloomCoarseGrainDataMapFactory.java:214)
    at org.apache.carbondata.processing.datamap.DataMapWriterListener.register(DataMapWriterListener.java:104)
    ... 10 more

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy28.create(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
    at sun.reflect.GeneratedMethodAccessor119.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy29.create(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
    at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
    at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
    at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.createNewFile(AbstractDFSCarbonFile.java:448)
    at org.apache.carbondata.core.datastore.impl.FileFactory.createNewFile(FileFactory.java:292)
    at org.apache.carbondata.core.datastore.impl.FileFactory.createNewFile(FileFactory.java:285)
    at org.apache.carbondata.datamap.bloom.AbstractBloomDataMapWriter.initDataMapFile(AbstractBloomDataMapWriter.java:167)
    ... 14 more

Driver stacktrace: 2019-12-18 20:35:53 INFO DAGScheduler:54 - Job 14 failed: collect at CarbonDataRDDFactory.scala:1250, took 0.181068 s 2019-12-18 20:35:53 INFO CarbonDataRDDFactory$:437 - DataLoad failure: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

2019-12-18 20:35:53 ERROR CarbonDataRDDFactory$:438 - org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0 (TID 35, localhost, executor driver): org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:155)
    at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:52)
    at org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.<init>(NewCarbonDataLoadRDD.scala:150)
    at org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:124)
    at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.ExecutionException: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:143)
    ... 12 more

Caused by: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.carbondata.processing.datamap.DataMapWriterListener.register(DataMapWriterListener.java:107)
    at org.apache.carbondata.processing.datamap.DataMapWriterListener.registerAllWriter(DataMapWriterListener.java:82)
    at org.apache.carbondata.processing.store.CarbonFactDataHandlerModel.createCarbonFactDataHandlerModel(CarbonFactDataHandlerModel.java:301)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:163)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.access$000(CarbonRowDataWriterProcessorStepImpl.java:57)
    at org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl$DataWriterRunnable.run(CarbonRowDataWriterProcessorStepImpl.java:331)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more

Caused by: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.carbondata.datamap.bloom.AbstractBloomDataMapWriter.initDataMapFile(AbstractBloomDataMapWriter.java:172)
    at org.apache.carbondata.datamap.bloom.AbstractBloomDataMapWriter.<init>(AbstractBloomDataMapWriter.java:63)
    at org.apache.carbondata.datamap.bloom.BloomDataMapWriter.<init>(BloomDataMapWriter.java:55)
    at org.apache.carbondata.datamap.bloom.BloomCoarseGrainDataMapFactory.createWriter(BloomCoarseGrainDataMapFactory.java:214)
    at org.apache.carbondata.processing.datamap.DataMapWriterListener.register(DataMapWriterListener.java:104)
    ... 10 more

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy28.create(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
    at sun.reflect.GeneratedMethodAccessor119.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy29.create(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
    at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
    at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
    at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.createNewFile(AbstractDFSCarbonFile.java:448)
    at org.apache.carbondata.core.datastore.impl.FileFactory.createNewFile(FileFactory.java:292)
    at org.apache.carbondata.core.datastore.impl.FileFactory.createNewFile(FileFactory.java:285)
    at org.apache.carbondata.datamap.bloom.AbstractBloomDataMapWriter.initDataMapFile(AbstractBloomDataMapWriter.java:167)
    ... 14 more

Driver stacktrace: 2019-12-18 20:35:53 INFO HdfsFileLock:52 - HDFS lock path:hdfs://test01:8020/carbon/lock/830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/LockFiles/tablestatus.lock 2019-12-18 20:35:53 INFO CarbonLoaderUtil:252 - Acquired lock for tabledefault.eve_not_partition_tbl for table status updation 2019-12-18 20:35:53 INFO CarbonLoaderUtil:380 - Table unlocked successfully after table status updationdefault.eve_not_partition_tbl 2019-12-18 20:35:53 INFO CarbonDataRDDFactory$:493 - ****starting clean up** 2019-12-18 20:35:53 INFO CarbonDataRDDFactory$:499 - ****clean up done** 2019-12-18 20:35:53 WARN CarbonDataRDDFactory$:500 - Cannot write load metadata file as data load failed 2019-12-18 20:35:53 ERROR CarbonLoadDataCommand:390 - java.lang.Exception: DataLoad failure: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

2019-12-18 20:35:53 INFO HdfsFileLock:52 - HDFS lock path:hdfs://test01:8020/carbon/lock/830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/LockFiles/tablestatus.lock 2019-12-18 20:35:53 INFO CarbonLoaderUtil:252 - Acquired lock for tabledefault.eve_not_partition_tbl for table status updation 2019-12-18 20:35:53 INFO CarbonLoaderUtil:380 - Table unlocked successfully after table status updationdefault.eve_not_partition_tbl 2019-12-18 20:35:53 INFO CarbonLoadDataCommand:440 - concurrent_load lock for tablehdfs://test01:8020/carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2has been released successfully 2019-12-18 20:35:53 ERROR CarbonLoadDataCommand:166 - Got exception java.lang.Exception: DataLoad failure: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) when processing data. But this command does not support undo yet, skipping the undo part. 2019-12-18 20:35:53 AUDIT audit:93 - {"time":"December 18, 2019 8:35:53 PM CST","username":"root","opName":"LOAD DATA","opId":"5910521070605463","opStatus":"FAILED","opTime":"5080 ms","table":"default.eve_not_partition_tbl","extraInfo":{"Exception":"java.lang.Exception","Message":"DataLoad failure: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder.\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2702)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)\n"}} 2019-12-18 20:35:53 ERROR SparkExecuteStatementOperation:91 - Error executing query, currentState RUNNING, java.lang.Exception: DataLoad failure: org.apache.carbondata.processing.datamap.DataMapWriterException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /carbon/spark2.3.2/default/eve_not_partition_tbl_830f7cfd-16b6-4e30-b237-bbc74ea9e1d2/eve_not_partition_tbl_bloomfilter_dm/2/0_batchno0-0-2-1576672548810/src_ip.bloomindex for DFSClient_NONMAPREDUCE_193894307_1 on 10.10.151.15 because DFSClient_NONMAPREDUCE_193894307_1 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3140) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2813)

jackylk commented 4 years ago

建的是哪种datamap呢?可以的话,发一下建datamap的语句。

a394023466 commented 4 years ago

建的是哪种datamap呢?可以的话,发一下建datamap的语句。

这个是建表语录 ​create table eve_tbl_not_partition ( startTime BIGINT, flow_id BIGINT, pcap_cnt BIGINT,pcap_len BIGINT,src_ip STRING,src_port INT,dest_ip STRING,dest_port INT,proto STRING) stored as carbondata TBLPROPERTIES ('TABLE_BLOCKSIZE'='512','TABLE_BLOCKLET_SIZE'='128','DICTIONARY_INCLUDE'='src_port,dest_port,proto');

这个是datamap语句 ​CREATE DATAMAP eve_tbl_not_partition_bloomfilter_dm on TABLE eve_tbl_not_partition USING "bloomfilter" DMPROPERTIES('index_columns'='src_ip','BLOOM_SIZE'='640000','BLOOM_FPP'='0.00001')

a394023466 commented 4 years ago

第 1 章 运行环境 1.1 硬件环境 主机型号 配置 华为2288 v5 内存256GB,56核 华为2288 v5 内存256GB,56核 华为2288 v5 内存256GB,56核 华为2288 v5 内存256GB,56核 华为2288 v5 内存512GB,56核

1.2 软件环境 组件名称 版本号 Cdh CDH-5.9.1-1.cdh5.9.1 Hdfs 2.6.0-cdh5.9.1 Yarn 2.6.0-cdh5.9.1 Spark 2.2.1-bin-2.6.0-cdh5.9.1吃的 Carbondata 1.6.0-bin-spark2.2.1-hadoop2.6.0-cdh5.9.1 第 2 章 运行配置

2.1 Carbondata配置 carbon.storelocation=hdfs://xxx:8020/carbon/spark2.2.1/carbon.store carbon.lock.type=HDFSLOCK carbon.lock.path=hdfs://xxx:8020/carbon/lock carbon.lock.retry.timeout.sec=10 carbon.lock.retries=6 carbon.concurrent.lock.retries=200 carbon.concurrent.lock.retry.timeout.sec=2 carbon.enable.auto.load.merge=true carbon.number.of.cores=8

2.2 Driver启动配置 spark-submit --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer --num-executors 50 --driver-memory 200G --executor-memory 10G --executor-cores 4 --conf spark.default.parallelism=600 --conf spark.sql.crossJoin.enabled=true --conf spark.sql.broadcastTimeout=1200 --conf spark.sql.shuffle.partitions=35 --conf spark.executor.memoryOverhead=4g /opt/bigdata/spark-2.2.1/spark-2.2.1-bin-2.6.0-cdh5.9.1/carbonlib/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.6.0-cdh5.9.1.jar hdfs://xxxx:8020/carbon/spark2.2.1 2.3 数据量

行数 大小 324亿 3T

第 3 章 问题总结

3.1 分区表 3.1.1 创建分区表 ​create table xxxxx (startTime BIGINT,flow_id BIGINT,pcap_cnt BIGINT,pcap_len BIGINT,src_ip STRING,src_port INT,dest_ip STRING) PARTITIONED BY (dest_port INT,proto STRING) stored as carbondata TBLPROPERTIES('TABLE_BLOCKSIZE'='512','TABLE_BLOCKLET_SIZE'='128','DICTIONARY_INCLUDE'='src_port'); 3.1.2 分区表数据加载 ​load data inpath 'hdfs://xxx:8020/xxx/' into table xxxx OPTIONS('DELIMITER'=',','HEADER'='false','BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://xxx:8020/carbon/spark2.2.1/badrecord','BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false') ; 3.1.3 分区表创建bloomfilter ​CREATE DATAMAP xxxx on TABLE xxxx USING "bloomfilter" DMPROPERTIES('index_columns'='src_ip','BLOOM_SIZE'='640000','BLOOM_FPP'='0.00001') 3.1.4 出现的问题 1) 问题一:如3.1.1的建表语句,以dest_port INT,proto STRING为分区条件创建分区表,在数据加载过程中,对应表中的Metadata目录中的segments/0下生成过多的小文件,导致如下异常:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /XXX/XXX/FF is exceeded: limit=1048576 items=1048576,导致数据Load失败。

2) 问题二:当我们以proto STRING为分区创建表并Load数据的时候,Load数据成功,但是创建bloomfilter的时候,每个索引过大(324亿条数据,大小为3T,一个字段的索引大约为84G,根据测试对于未分区表索引大小为2.7G),如果索引字段过多,导致driver内存不够用,抛出java虚拟机异常:java.lang.OutOfMemoryError: Java heap space,导致查询失败。 3) 问题三:分区下产生过多的小文件,测试过程中,324亿条数据,两个分区目录,分区下每个.carbondata文件大小约为25MB。 4) 问题四:先创建bloomfilter,在load数据,抛出异常 3.2 普通表 3.2.1 创建表 ​create table eve_tbl_not_partition ( startTime BIGINT, flow_id BIGINT, pcap_cnt BIGINT,pcap_len BIGINT,src_ip STRING,src_port INT,dest_ip STRING,dest_port INT,proto STRING) stored as carbondata TBLPROPERTIES ('TABLE_BLOCKSIZE'='512','TABLE_BLOCKLET_SIZE'='128','DICTIONARY_INCLUDE'='src_port,dest_port,proto'); 3.2.2 创建bloomfilter ​CREATE DATAMAP eve_tbl_not_partition_bloomfilter_dm on TABLE eve_tbl_not_partition USING "bloomfilter" DMPROPERTIES('index_columns'='src_ip','BLOOM_SIZE'='640000','BLOOM_FPP'='0.00001') 3.2.3 数据加载 ​load data inpath 'hdfs://xxx:8020/xxx/' into table eve_tbl_not_partition OPTIONS('DELIMITER'=',','HEADER'='false','BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://xxx:8020/carbon/spark2.2.1/badrecord','BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false') ;

3.2.4 出现的问题 1) 问题一:先创建Bloomfilter再Load数据的时候,抛出以下异常:Caused by:java.util.concurrent.ExecutionException:org.apache.carbondata.processing.datamap.DataMapWriterException:java.io.IOException:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):failed to create file /carbon/test/xxx/customer_address/customer_address_floom_dm/0/0_batchno0-0-0-1575453105660/ca_address_sk.bloomindex for DFSClient_NONMAPREDUCE_837956501_174 for client 11.11.11.11 because current leaseholder is trying to recreate file.,导致索引创建失败,并且数据Load失败。

2) 问题二:先Load数据,然后创建bloomfilter,索引创建成功,1T的数据索引大小约为2.7G,当查询条件包含索引列时,抛出以下异常:

---原始邮件--- 发件人:"Jacky Li"notifications@github.com 发送时间:2019年12月21日 星期六 上午10:57 收件人:"carbonlake/forum"forum@noreply.github.com 抄送:"a394023466"394023466@qq.com;"Author"author@noreply.github.com 主题:Re: [carbonlake/forum] 先创建datamap,在加载数据抛出以下异常 (#8)

建的是哪种datamap呢?可以的话,发一下建datamap的语句。

— You are receiving this because you authored the thread. Reply to this email directly,view it on GitHub, orunsubscribe.