Open borderlayout opened 2 weeks ago
In general you shouldn't be using the pathing information for this, instead you should use the Files or Partitions Metadata tables. This is important because the storage layer gives you the full history of the table and not the current state. For example just because you have 10 files in a directory, it doesn't mean all 10 are live in the current table state.
Feature Request / Improvement
Hi all: When using Amazon S3 object storage with Iceberg, there can be a throttling issue for the same path. By setting the parameter write.object-storage.enabled=true, files under the same file path are hashed to different paths, which avoids the throttling issue with Amazon S3 object storage. (see:https://iceberg.apache.org/docs/nightly/docs/configuration/?h=write.object+storage.enabled#write-properties)
However, I encountered a problem: when setting up partitioned tables, the hash values in the path are inserted before the partition name, making it difficult to gather information for individual partition, such as the number of files or file sizes of one partition.
Is there a reason for designing it this way? If putting the random value after the partition fields would be a better approach ?
bucket/iceberg_test1/data/_44Xmw/parCol=2024-01-10/00295-2798-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00003.parquet bucket/iceberg_test1/data/_5l5dQ/parCol=2024-01-09/00063-2566-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00006.parquet
==changed ==> bucket/iceberg_test1/data/parCol=2024-01-10/_44Xmw/00295-2798-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00003.parquet bucket/iceberg_test1/data/parCol=2024-01-09/_5l5dQ/00063-2566-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00006.parquet
bucket/iceberg_test3/data/APigWw/parCol=2024-01-01/gender=male/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00003.parquet bucket/iceberg_test3/data/4Z-_sw/parCol=2024-01-01/gender=male/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00001.parquet
===changed==> bucket/iceberg_test3/data/parCol=2024-01-01/gender=male/APigWw/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00003.parquet bucket/iceberg_test3/data/parCol=2024-01-01/gender=male/4Z-_sw/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00001.parquet
Query engine
Spark
Willingness to contribute