Open madeirak opened 3 weeks ago
The table has two partition keys from two partition transforms, one of which is bucket.
The table has two partition keys from two partition transforms, one of which is bucket.
Are these two partition transforms equivalent? name_bucket_10 and id_bucket_10
Are the principle both hash?
Sorry, I missed name_bucket_10
part. How did you create your table? With which catalog?
Sorry, I missed
name_bucket_10
part. How did you create your table? With which catalog?
Similar to the following process:
create table dbxx.tbxx (id INT COMMENT '11', name STRING COMMENT '') USING iceberg PARTITIONED BY (name, bucket(10, name), bucket(10, id ));
insert into tbxx values (1, '1');
show create table dbxx.tbxx ;
select * from dbxx.tbxx.partitions;
Sorry, I missed
name_bucket_10
part. How did you create your table? With which catalog?
With HiveCatalog
create table dbxx.tbxx (id INT COMMENT '11', name STRING COMMENT '') USING iceberg PARTITIONED BY (name, bucket(10, name), bucket(10, id )); insert into tbxx values (1, '1'); show create table dbxx.tbxx ; select * from dbxx.tbxx.partitions;
I am quite puzzled why name is used as both partition and bucket. In this case, all the data under the name partition is in the same bucket, and the bucketing effect is meaningless.
create table dbxx.tbxx (id INT COMMENT '11', name STRING COMMENT '') USING iceberg PARTITIONED BY (name, bucket(10, name), bucket(10, id )); insert into tbxx values (1, '1'); show create table dbxx.tbxx ; select * from dbxx.tbxx.partitions;
I am quite puzzled why name is used as both partition and bucket. In this case, all the data under the name partition is in the same bucket, and the bucketing effect is meaningless.
This is just an example, not a real table. The main issue is that multiple bucket fields only display one in "show create table xxx"
The show create table
result is following Spark SQL syntax, which only supports one bucket field.
The
show create table
result is following Spark SQL syntax, which only supports one bucket field.
ok, fine. It would be better if it could be as shown in the Iceberg document: ref: https://iceberg.apache.org/docs/latest/spark-ddl/#partitioned-by
Apache Iceberg version
1.4.3
Query engine
Spark
Please describe the bug 🐞
Through "select * from xx.xx.partitions" above, it can be seen that this table has two bucket keys. But "show create table xx.xx"as below,only display one bucket key