Intel-bigdata / SSM

Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution
Apache License 2.0
133 stars 67 forks source link

Hive on MR query with COMPACT file , throw EOFE "Cannot seek to negative offset" #2259

Open MyqueWooMiddo opened 2 years ago

MyqueWooMiddo commented 2 years ago

I generate a small table 'small_table' with several INSERTs , so there're many small files in HDFS

I create a SSM compact rule on 'small_table' than run action , then the _container_file was produced on the same HDFS path.

I run 'hadoop fs -cat /small_table/xx.txt ' , I can view its content . In the mean time , I can view xx.txt and _container_file open & getXAttrs operation in hdfs-audit.log .

In Hive , I can query the small_table with filter (without MR) , but I can't run aggregate such an count(*) on it (with MR) . It will throw the stack :

org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:217) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:176) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257) ... 11 more Caused by: java.io.EOFException: Cannot seek to negative offset at org.apache.hadoop.hdfs.CompactInputStream.seek(CompactInputStream.java:135) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:71) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:140) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:99) ... 16 more

My Hive version is 3.1.x and Hadoop version is 3.3.x