apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.54k stars 1.39k forks source link

Quoted identifiers in column names #1987

Open asfimport opened 8 years ago

asfimport commented 8 years ago

Add the ability to quote identifiers for columns in a table. This would allow column names to contain arbitrary characters such as spaces. Hive supports these types of identifiers using backquotes. For example,

create table parquet_table (Session Token string) stored as parquetfile;

However, attempting to insert a new row into this table results in an error.

insert into parquet_table values ('1234-45')

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: field ended by ';': expected ';' but got 'token' at line 1: optional string Session Token

I would suggest using backquotes in Parquet as well.

Reporter: Michael Styles

Note: This issue was originally created as PARQUET-677. Please see the migration documentation for further details.

asfimport commented 8 years ago

Michael Styles: @julienledem I am in the process of putting together a PR for this issue. The master branch (1.8.2-SNAPSHOT) has a dependency on Hive 0.12. Some of the hive-related tests I'm trying to write would require Hive 0.13 which supports quoted identifiers. Is there any plan on moving up to Hive 0.13?

asfimport commented 8 years ago

Julien Le Dem / @julienledem: Parquet support for Hive has moved to the hive repo itself for more recent versions. You should find the same tests there. The serde in the parquet repo is for older versions of Hive so it wont be moving up. This was to better support compatibility across Hive versions since Hive's API kept changing.

asfimport commented 8 years ago

Michael Styles: PR: https://github.com/apache/parquet-mr/pull/361