apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

Cannot read parquet file that was generated from nanoparquet #3043

Open RealTYPICAL opened 2 weeks ago

RealTYPICAL commented 2 weeks ago

Describe the bug, including details regarding any error messages, version, and platform.

When trying to use MessageColumnIO to get a record reader, the following error occurs:

Exception in thread "main" java.lang.UnsupportedOperationException
    at org.apache.parquet.column.values.ValuesReader.readInteger(ValuesReader.java:178)
    at org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:830)
    at org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:663)
    at org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:801)
    at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
    at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:43)
    at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:80)
    at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:282)
    at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:141)
    at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:105)
    at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:180)
    at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:105)

I expected that this would work. Pyarrow for example can read the file.

Sample file can be found here:

mtcars_np.zip

Nanoparquet can be found here:

https://github.com/r-lib/nanoparquet

Version: 1.14.3 Platform: Linux

Component(s)

Core

wgtmac commented 6 days ago

Thanks for reporting this issue! I can confirm that it has been reproduced on my side. Will take a look later.