apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.42k forks source link

Using column index to filtering null page got java.lang.ArrayIndexOutOfBoundsException: -1 #2809

Closed asfimport closed 1 year ago

asfimport commented 1 year ago

An empty page index like following

 


// code placeholder
Boudary order: ASCENDING
                      null count  min                           max                                     
page-0                         2  <none>                      <none>                                   

 

My SQL in SparkSQL like this

 


// code placeholder
spark.sql("select * from tbl where empty_page_column < 2 or empty_page_column is null").collect

 

 

Under the condition that both "empty_page_column < 2"  and "empty_page_column is null "  are used at the same time and concatenated with 'or' ,  ArrayIndexOutOfBoundsException will be thrown.

 

Got following error

 


// code placeholder
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.parquet.internal.column.columnindex.IntColumnIndexBuilder$IntColumnIndex$1.compareValueToMin(IntColumnIndexBuilder.java:74)
    at org.apache.parquet.internal.column.columnindex.BoundaryOrder$2.lt(BoundaryOrder.java:123)
    at org.apache.parquet.internal.column.columnindex.ColumnIndexBuilder$ColumnIndexBase.visit(ColumnIndexBuilder.java:262)
    at org.apache.parquet.internal.column.columnindex.ColumnIndexBuilder$ColumnIndexBase.visit(ColumnIndexBuilder.java:64)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.lambda$visit$2(ColumnIndexFilter.java:131)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.applyPredicate(ColumnIndexFilter.java:176)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:131)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
    at org.apache.parquet.filter2.predicate.Operators$Lt.accept(Operators.java:209)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:191)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
    at org.apache.parquet.filter2.predicate.Operators$Or.accept(Operators.java:321)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:186)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
    at org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81)
    at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137)
    at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
    at org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:1128)
    at org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:943)
    at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initializeParquetReader(SpecificParquetRecordReaderBase.java:137)
    at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:107)
    at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:214)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$1(ParquetFileFormat.scala:413)
    ... 25 more 

 

 

Reporter: GANHONGNAN

Note: This issue was originally created as PARQUET-2341. Please see the migration documentation for further details.

asfimport commented 1 year ago

GANHONGNAN: This issue has been solved by PARQUET-1744: Some filters throws ArrayIndexOutOfBoundsException (#732)