Open asfimport opened 10 years ago
Kristoffer Sjögren / @krisskross: I should add that data is written using AvroParquetFileTarget and SNAPPY compression. Data is read using AvroParquetFileSource with UnboundRecordFilter and includeField.
Kristoffer Sjögren / @krisskross: Seems unrelated to compression and field inclusion.
But if I remove the UnboundRecordFilter the job finish successfully.
Kristoffer Sjögren / @krisskross:
public static class ActionFilter implements UnboundRecordFilter {
private final UnboundRecordFilter filter;
public ActionFilter() {
filter = ColumnRecordFilter.column("action", ColumnPredicates.equalTo("bid"));
}
@Override
public RecordFilter bind(Iterable<ColumnReader> readers) {
return filter.bind(readers);
}
}
Jan Morlock: any news here? We are sometimes facing the same problem with Parquet 1.5.0.
Francisco Guerrero: facing the same issue with Parquet. When there are null fields in a column with filter, this issue will arise
Tristan Davolt: I am facing the same issue with Parquet 1.10.0. Data is being written using AvroParquetWriter and Snappy compression. Occasionally and randomly, one file of the many we write using the same method will throw a similar error as above when being read by any parquet reader. I have not yet found a workaround. The exception is thrown for the final value of a random column. This does not only occur with null fields. Our schema defines every field as optional.
java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream.java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream. at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53) at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80) at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62) at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53) at org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733) at org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:568) at org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705) at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30) at org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:358) at org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:231) at org.apache.parquet.tools.command.DumpCommand.execute(DumpCommand.java:148) at org.apache.parquet.tools.Main.main(Main.java:223)java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream.
I am using Avro and Crunch 0.11 to write data into Hadoop CDH 4.6 in parquet format. This works fine for a few gigabytes but blows up in the RunLengthBitPackingHybridDecoder when reading a few thousands gigabytes.
Environment: Java 1.7 Linux Debian Reporter: Kristoffer Sjögren / @krisskross Assignee: Reuben Kuhnert / @sircodesalotOfTheRound
Note: This issue was originally created as PARQUET-112. Please see the migration documentation for further details.