DTStack / chunjun

A data integration framework
https://dtstack.github.io/chunjun/
Apache License 2.0
4.01k stars 1.69k forks source link

standlone模式oracle写入hive1 使用orc格式存储失败 #1178

Open LeonYoah opened 2 years ago

LeonYoah commented 2 years ago

Search before asking

Description

org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:118) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:80) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:233) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:224) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:215) at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:666) at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:446) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at akka.actor.Actor.aroundReceive(Actor.scala:517) at akka.actor.Actor.aroundReceive$(Actor.scala:515) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars at org.apache.hadoop.hive.ql.io.orc.MemoryManager.(MemoryManager.java:83) at org.apache.hadoop.hive.ql.io.orc.OrcFile.getMemoryManager(OrcFile.java:482) at org.apache.hadoop.hive.ql.io.orc.OrcFile.access$000(OrcFile.java:34) at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.(OrcFile.java:262) at org.apache.hadoop.hive.ql.io.orc.OrcFile.writerOptions(OrcFile.java:418) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getOptions(OrcOutputFormat.java:134) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordWriter(OrcOutputFormat.java:180) at com.dtstack.chunjun.connector.hdfs.sink.HdfsOrcOutputFormat.nextBlock(HdfsOrcOutputFormat.java:168) at com.dtstack.chunjun.connector.hdfs.sink.HdfsOrcOutputFormat.writeSingleRecordToFile(HdfsOrcOutputFormat.java:203) at com.dtstack.chunjun.sink.format.BaseFileOutputFormat.writeSingleRecordInternal(BaseFileOutputFormat.java:126) at com.dtstack.chunjun.sink.format.BaseRichOutputFormat.writeSingleRecord(BaseRichOutputFormat.java:466) at com.dtstack.chunjun.sink.format.BaseRichOutputFormat.writeRecord(BaseRichOutputFormat.java:272) at com.dtstack.chunjun.connector.hive.sink.HiveOutputFormat.writeRecord(HiveOutputFormat.java:183) at com.dtstack.chunjun.connector.hive.sink.HiveOutputFormat.writeRecord(HiveOutputFormat.java:67) at com.dtstack.chunjun.sink.DtOutputFormatSinkFunction.invoke(DtOutputFormatSinkFunction.java:117) at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:398) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:619) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:583) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:758) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:573) at java.lang.Thread.run(Thread.java:748)

chubnjun版本是 master编译的 flink12.7

Code of Conduct

LeonYoah commented 2 years ago

text和parquet格式没啥问题 只不过paquet格式比text格式慢好多!!!

FlechazoW commented 2 years ago

text和parquet格式没啥问题 只不过paquet格式比text格式慢好多!!!

慢好多是指?有具体的数据对比嘛?

FlechazoW commented 2 years ago

兄弟你提交的对应hive 版本是多少,对应的hadoop 版本是多少?

LeonYoah commented 2 years ago

兄弟你提交的对应hive 版本是多少,对应的hadoop 版本是多少?

是安装的ambari,hdp2.6,hadoop是2.7,hive是1.x

总共128mb100万条数据,text格式用2分钟,orc执行失败,我记得以前在别的节点安的chunjun成功过,现在不行了,然后parquet格式要5分钟