Nuttymoon / nifi-hive3streaming-fixed

A NiFi bundle containing a stable implementation of the PutHive3Streaming processor
Apache License 2.0
2 stars 1 forks source link

java.lang.NullPointerException when trying to stream data into hive 3.1.0 #2

Closed brobeckero closed 4 years ago

brobeckero commented 4 years ago

Dear Nuttymoon,

I'm using Nifi 1.9.2 and I'm trying to stream data to an kerberized Hive on HDP cluster ( 3.1 ).

Compiling, unpacking and configuring it went fine but, when trying to write data into the table ( tested for partitionned, non partitionned and non transactionnal table ), the flow file are redirected into the failed queue.

Looking at the logs, we can see everything went fine until closing the org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater :

2020-09-04 15:45:48,797 ERROR [Timer-Driven Process Thread-8] o.a.h.s.AbstractRecordWriterFixed Unable to close org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater[hdfs://hdfs-hadoop/warehouse/tablespace/managed/hive/cdrs.db/blablabla/delta_0000001_0000001/bucket_00000] due to: null

here is the complete logs starting from the start of the processor :

2020-09-04 15:04:14,599 INFO [NiFi Web Server-17] o.a.n.c.s.StandardProcessScheduler Starting PutHive3StreamingFixed[id=5821ed37-0174-1000-2b4e-a70ca05c3550] 2020-09-04 15:04:14,700 INFO [Timer-Driven Process Thread-9] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutHive3StreamingFixed[id=5821ed37-0174-1000-2b4e-a70ca05c3550] to run with 1 threads 2020-09-04 15:04:14,701 INFO [Timer-Driven Process Thread-9] o.a.h.s.HiveStreamingConnectionFixed Creating metastore client for streaming-connection 2020-09-04 15:04:14,713 INFO [Timer-Driven Process Thread-9] o.a.h.s.HiveStreamingConnectionFixed Creating metastore client for streaming-connection-heartbeat 2020-09-04 15:04:14,910 INFO [Timer-Driven Process Thread-9] o.a.h.s.HiveStreamingConnectionFixed STREAMING CONNECTION INFO: { metastore-uri: thrift://opdahma02.si.reunion.ftm.francetelecom.fr:9083, database: cdrs, table: tst_cra_technique, partitioned-table: true, dynamic-partitioning: true, username: svc-dah-cras_tech, secure-mode: true, record-writer: HiveRecordWriterFixed, agent-info: NiFi PutHive3StreamingFixed [5821ed37-0174-1000-2b4e-a70ca05c3550] thread 78[Timer-Driven Process Thread-9] } 2020-09-04 15:04:14,910 INFO [Timer-Driven Process Thread-9] o.a.h.s.HiveStreamingConnectionFixed Starting heartbeat thread with interval: 150000 ms initialDelay: 7652 ms for agentInfo: NiFi PutHive3StreamingFixed [5821ed37-0174-1000-2b4e-a70ca05c3550] thread 78[Timer-Driven Process Thread-9] 2020-09-04 15:04:14,937 INFO [Timer-Driven Process Thread-9] o.a.h.s.AbstractRecordWriterFixed Created new filesystem instance: 733719686 2020-09-04 15:04:14,937 INFO [Timer-Driven Process Thread-9] o.a.h.s.AbstractRecordWriterFixed Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0 2020-09-04 15:04:14,938 INFO [Timer-Driven Process Thread-9] o.a.h.s.HiveStreamingConnectionFixed Opened new transaction batch TxnId/WriteIds=[814825/18...814825/18] on connection = { metaStoreUri: thrift://opdahma02.si.reunion.ftm.francetelecom.fr:9083, database: cdrs, table: tst_cra_technique }; TxnStatus[O] LastUsed txnid:0 2020-09-04 15:04:14,999 INFO [Timer-Driven Process Thread-9] o.a.h.s.AbstractRecordWriterFixed Partition event_day=20200904 already exists for table cdrs.tst_cra_technique 2020-09-04 15:04:15,085 INFO [Timer-Driven Process Thread-9] o.a.h.s.AbstractRecordWriterFixed Stats before close: [record-updaters: 1, partitions: 1, buffered-records: 0 total-records: 0 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 700,00MB/14,00GB] 2020-09-04 15:04:15,085 INFO [Timer-Driven Process Thread-9] o.a.h.s.AbstractRecordWriterFixed Closing updater for partitions: [20200904] 2020-09-04 15:04:15,085 ERROR [Timer-Driven Process Thread-9] o.a.h.s.AbstractRecordWriterFixed Unable to close org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater[hdfs://hdfs-hadoop/warehouse/tablespace/managed/hive/cdrs.db/tst_cra_technique/event_day=20200904/delta_0000018_0000018/bucket_00000] due to: null at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59) at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70) at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64) at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78) at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56) at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:556) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushInternalBatch(WriterImpl.java:297) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:334) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:557) at org.apache.hive.streaming.AbstractRecordWriterFixed.close(AbstractRecordWriterFixed.java:368) at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.closeImpl(HiveStreamingConnectionFixed.java:1006) at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.close(HiveStreamingConnectionFixed.java:997) at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.markDead(HiveStreamingConnectionFixed.java:860) at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.write(HiveStreamingConnectionFixed.java:841) at org.apache.hive.streaming.HiveStreamingConnectionFixed.write(HiveStreamingConnectionFixed.java:551) at org.apache.nifi.processors.hive.PutHive3StreamingFixed.onTrigger(PutHive3StreamingFixed.java:432) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:209) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

the datafile updated by the processor can ends up corrupted.

Thanks in advance for your kind help.

BR

brobeckero commented 4 years ago

found it !

my fault, I did switch hadoop version with hive version.

TheSeptembre commented 3 years ago

@brobeckero , how did you solve the version problem?

my fault, I did switch hadoop version with hive version.

You mean the hadoop and hive versions were not the same? I have the same issue with Hive 3.1.0 and Hadoop 3.1.0 and NiFi 1.9.0.