apache / parquet-format

Apache Parquet Format
https://parquet.apache.org/
Apache License 2.0
1.69k stars 422 forks source link

PARQUET-2485: Be more consistent with BYTE_ARRAY types #251

Closed etseidl closed 2 weeks ago

etseidl commented 1 month ago

Changes instances of 'binary' to BYTE_ARRAY where appropriate. Also fixes some uses of FIXED_LEN_BYTE_ARRAY.

Make sure you have checked all steps below.

Jira

Commits

Documentation

etseidl commented 1 month ago

Note: I've left 'binary' in the schema examples for now since I'm not sure if the current parquet-cli still uses 'binary' when printing file schemas.

wgtmac commented 1 month ago

Note: I've left 'binary' in the schema examples for now since I'm not sure if the current parquet-cli still uses 'binary' when printing file schemas.

Yes, I think we need to fix this as well.

etseidl commented 1 month ago

I just noticed I left in the converted type UTF8 rather than using the proper logical type name STRING. I'll fix that up tomorrow.

etseidl commented 1 month ago

Note: I've left 'binary' in the schema examples for now since I'm not sure if the current parquet-cli still uses 'binary' when printing file schemas.

Yes, I think we need to fix this as well.

I verified that parquet-cli 1.14.0 still uses 'binary' for BYTE_ARRAY. I welcome suggestions for the schema examples in LogicalTypes.md.

wgtmac commented 2 weeks ago

LGTM cc @pitrou @alamb @tustvold