Open TheSeptembre opened 3 years ago
We facing exactly the same issue with Hive 3.1.0 and NiFi 1.9.0.3.4
We validated with load tests that the PutHive3StreamingFixed processor fixes the memory leak but we failed to use it when the hive table contains timestamp or date colunms.
We discovered that the deltas files are created on the hdfs folders but SQL queries over the table fails, the partition seems corrupted
2020-10-09 15:03:13,986 INFO [Timer-Driven Process Thread-3] o.a.h.s.AbstractRecordWriterFixed Closing updater for partitions: [2020, 10, 9]
2020-10-09 15:03:13,986 ERROR [Timer-Driven Process Thread-3] o.a.h.s.AbstractRecordWriterFixed Unable to close org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater[hdfs://xxx/yyy/zzz/data/hive/www.db/ttt/year=2020/month=10/day=9/delta_0000002_0000002/bucket_00000] due to: null
java.lang.NullPointerException: null
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59)
at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64)
at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78)
at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:556)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushInternalBatch(WriterImpl.java:297)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:334)
at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:557)
at org.apache.hive.streaming.AbstractRecordWriterFixed.close(AbstractRecordWriterFixed.java:368)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.closeImpl(HiveStreamingConnectionFixed.java:1006)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.close(HiveStreamingConnectionFixed.java:997)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.markDead(HiveStreamingConnectionFixed.java:860)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.write(HiveStreamingConnectionFixed.java:841)
at org.apache.hive.streaming.HiveStreamingConnectionFixed.write(HiveStreamingConnectionFixed.java:551)
at org.apache.nifi.processors.hive.PutHive3StreamingFixed.onTrigger(PutHive3StreamingFixed.java:432)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-10-09 15:03:14,007 ERROR [Timer-Driven Process Thread-3] o.a.h.s.HiveStreamingConnectionFixed Fatal error on TxnId/WriteIds=[1670528/2...1670528/2] on connection = { metaStoreUri: thrift://blabla, database: www, table: ttt }; TxnStatus[A] LastUsed txnid:1670528; cause Encountered errors while closing (see logs) [2020, 10, 9] writeIds[2,2]
org.apache.hive.streaming.StreamingIOFailure: Encountered errors while closing (see logs) [2020, 10, 9] writeIds[2,2]
at org.apache.hive.streaming.AbstractRecordWriterFixed.close(AbstractRecordWriterFixed.java:391)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.closeImpl(HiveStreamingConnectionFixed.java:1006)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.close(HiveStreamingConnectionFixed.java:997)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.markDead(HiveStreamingConnectionFixed.java:860)
at org.apache.hive.streaming.HiveStreamingConnectionFixed$TransactionBatch.write(HiveStreamingConnectionFixed.java:841)
at org.apache.hive.streaming.HiveStreamingConnectionFixed.write(HiveStreamingConnectionFixed.java:551)
at org.apache.nifi.processors.hive.PutHive3StreamingFixed.onTrigger(PutHive3StreamingFixed.java:432)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-10-09 15:03:14,007 INFO [Timer-Driven Process Thread-3] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 1
2020-10-09 15:03:14,007 INFO [Timer-Driven Process Thread-3] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 0
2020-10-09 15:03:14,007 INFO [Timer-Driven Process Thread-3] o.a.h.s.HiveStreamingConnectionFixed Closed streaming connection. Agent: NiFi PutHive3StreamingFixed [0d4a737f-0175-1000-ffff-ffffb6b38fee] thread 100[Timer-Driven Process Thread-3] Stats: {records-written: 0, records-size: 0, committed-transactions: 0, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 8 }
2020-10-09 15:03:15,971 INFO [NiFi Web Server-17785] o.a.n.c.s.StandardProcessScheduler Stopping PutHive3StreamingFixed[id=0d4a737f-0175-1000-ffff-ffffb6b38fee]
2020-10-09 15:03:15,971 INFO [NiFi Web Server-17785] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.hive.PutHive3StreamingFixed
2020-10-09 15:03:15,971 INFO [Timer-Driven Process Thread-6] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutHive3StreamingFixed[id=0d4a737f-0175-1000-ffff-ffffb6b38fee] to run
Hello, We're encountering some issues with the Hive3Streaming processor on Hive 3.0 where NiFi goes into OutOfMemory due to the known memory leaks. However, the Hive3StreamingFixed solved memory leaks (yay) but doesn't work with dates and timestamps format. Our tables have the following format :
The issue corrupts the whole table (
not a valid ORC file
when querying with SQL client) and it says in the logs:We know the issue comes from timestamp and dates because if we remove them from the incoming Flowfile, it works without error and solves the memory leaks too.
Thank you for reading.