Open Tonylin1998 opened 6 months ago
I can confirm the same behaviour with Minio instead of GCS - when dropping an Iceberg table in SparkSQL using DROP TABLE ... PURGE
, the data files are removed but the directory structure is not cleaned up
I believe Iceberg does this on purpose. The reason is that it's feasible with Iceberg to make multiple tables sharing the same location. So when you drop a table (or a partition) it's not safe to drop the entire folder because another table might have files in it (or might want to put files into it later on).
Query engine
spark
Question
I was using Iceberg with PySpark, and using JDBC catalog, and set warehouse to GCS
I creata a table using
date
as partition key. I write some data into table, and decide to delete date=20240220, so II find that the parquet file under
date=20240220
is deleted, but the folderdate=20240220
still remainAlso the same, when I drop the table using
the data will be deleted, but all the partition folders will still remain These behavior cause many empty folders in my gcs, I wonder if there is any way I can do in iceberg to prevent this from happening?