apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.52k stars 2.25k forks source link

hive catalog drop table XX purge not delete hdfs path #9869

Open mengshangxun opened 8 months ago

mengshangxun commented 8 months ago

Apache Iceberg version

1.4.3 (latest release)

Query engine

Spark

Please describe the bug 🐞

hdfs+spark3.3+iceberg1.4.3+hive catalog 1、create table test(id string,name string) using iceberg; 2、insert into test select 'a','a'; 3、drop table test purge; purge sql only delete iceberg datafile and metadata file,but hdfs path still exist. it is confused when reuse this path.

manuzhang commented 8 months ago

This is expected behavior as Iceberg only purges its referenced metadata and data, and nothing more.

nqvuong1998 commented 8 months ago

In my case, when using PURGE, Iceberg only deletes metadata files. Do I need any conf to be able to delete data files?

RussellSpitzer commented 8 months ago

No it also removes data files, the only exception is files which are not referenced by the Iceberg table.

mengshangxun commented 8 months ago

when create a hive catalog table ,it will create hdfs path in the spark warehouse,but now neither drop table ordrop table purge can delete this path. if i use drop table and create table in the same path,this path will contain old meta file and data file. this old files will nerver delete. is delete path after delete files is a better choice?

wanghualei commented 8 months ago

it is a big problem!

vinhnemo commented 1 month ago

so which is the best way to clean the data when tables are dropped?