databrickslabs / delta-sharing-java-connector

A Java connector for delta.io/sharing/ that allows you to easily ingest data on any JVM.
https://databrickslabs.github.io/delta-sharing-java-connector/
Apache License 2.0
12 stars 5 forks source link

Unable to read date format columns (int96 type) from avro-parquet schema #22

Open jeremihas-caruso opened 3 months ago

jeremihas-caruso commented 3 months ago

I am facing the following exception when reading the parquet file having date column:

java.lang.IllegalArgumentException: INT96 is deprecated. As interim enable READ_INT96_AS_FIXED flag to read as byte array.

at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:331)
at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:313)
at org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:341)
at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:312)
at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:290)
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279)
at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:134)
at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
at com.databricks.labs.delta.sharing.java.format.parquet.TableReader.read(TableReader.java:57)
jeremihas-caruso commented 3 months ago

INT96 is deprecated as we can see in stackoverflow https://stackoverflow.com/questions/55829202/unable-to-read-date-format-columns-int96-type-from-avro-parquet-schema-in-apac The solutions is set "parquet.avro.readInt96AsFixed" configuration to "true" when build the reader.