[Bug] file.compression do no work for parquet format

apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

https://paimon.apache.org/

Apache License 2.0

2.39k stars 942 forks source link

[Bug] file.compression do no work for parquet format #1494

Closed Aitozi closed 1 year ago

Aitozi commented 1 year ago

Search before asking

[X] I searched in the issues and found nothing similar.

Paimon version

master

Compute Engine

parquet format

Minimal reproduce step

In parquet format, it will use compress key to extract from the option, and it has the higher preference than from the file.compression. So the actual work key is compress

What doesn't meet your expectations?

the actual compression algorithm

Anything else?

No response

Are you willing to submit a PR?

[X] I'm willing to submit a PR!

zhangjun0x01 commented 1 year ago

I test it in my computer,it is ok, could you provide more information?

Flink SQL> create table t6(id int , name string) with ('file.format'='parquet','file.compression'='ZSTD');
[INFO] Execute statement succeed.

Flink SQL> insert into t6 select 6,'bbb';
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 4a4712de0ecf72630ba801aad35702cf

Flink SQL> select * from t6;
+----+------+
| id | name |
+----+------+
|  6 |  bbb |
+----+------+
1 row in set

Flink SQL>

Aitozi commented 1 year ago

Hi @zhangjun0x01 , have you verified that the 'file.compression'='ZSTD' has actually worked ?

Aitozi commented 1 year ago

@zhangjun0x01 I just push a fix for this, could you help take a look ?

JingsongLi commented 1 year ago

@Aitozi This issue should be "parquet.compression not work"?

Aitozi commented 1 year ago

parquet.compression can work, file.compression not work. The option in configuration is file.compression. I think parquet.compression is not the recommended way now?