apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.55k stars 1.39k forks source link

Interpret Parquet INT96 type as FIXED[12] AVRO Schema #2537

Closed asfimport closed 3 years ago

asfimport commented 3 years ago

Reading Parquet files in Apache Beam using ParquetIO uses AvroParquetReader causing it to throw IllegalArgumentException("INT96 not implemented and is deprecated")

Customers have large datasets which can't be reprocessed again to convert into a supported type. An easier approach would be to convert into a byte array of 12 bytes, that can then be interpreted by the developer in any way they want to interpret it.

Reporter: Anant Damle / @anantdamle Assignee: Anant Damle / @anantdamle

Related issues:

Note: This issue was originally created as PARQUET-1928. Please see the migration documentation for further details.

asfimport commented 3 years ago

Anant Damle / @anantdamle: https://github.com/apache/parquet-mr/pull/831

asfimport commented 2 years ago

Nitish G: Looks like this bug has resurfaced again in 3.0.0 release, Can we please have it investigated ?

Using IntelliJ : 2021.3

Avro and Parquet Viewer : 3.0.0 (6th March 2022)

asfimport commented 2 years ago

Timothy Miller / @theosib-amazon: Is there a reason why patches such as this are not merged? Do the maintainers want more evaluation of the consequences of the change? I'd be happy to help.

asfimport commented 2 years ago

Timothy Miller / @theosib-amazon: It looks like the change was already merged. [~iamnitish] if you're running into a problem similar to this, it may be a different bug. Can you provide a minimal test case and a parquet file so that we can more easily reproduce and investigate this problem? Thanks.