apache / parquet-format

Apache Parquet Format
https://parquet.apache.org/
Apache License 2.0
1.76k stars 428 forks source link

Support Int8 and Int16 as basic type #397

Open asfimport opened 2 years ago

asfimport commented 2 years ago

 Int8 and Int16 are not supported as basic in previos version. Using 4 bytes to store int8 seems not a good idea, which means requiring more storage and read and write very slow. Besides, it is not friendly with regular computing format, such as velox, arrow, vector and so on.

With Int8 and Int16 supported, we can get less storage and better performance on reading and writing. As for forward compatible, we can use version in FileMetaData to choose how to read parquet data.

Reporter: Jackey Lee / @jackylee-ch

Note: This issue was originally created as PARQUET-2133. Please see the migration documentation for further details.

asfimport commented 2 years ago

Timothy Miller / @theosib-amazon: Have you started working on implementing this? What is your progress. I'd be happy to work with you on it.

asfimport commented 2 years ago

Micah Kornfield / @emkornfield: before we start working on it it should probably be discussed on the dev@ mailing list to make sure people are OK with the specification change.