Spark-hive catalog drop table XX purge not use purge code in HiveCatalog.dropTable()

Hey there!

The reason Spark with HiveCatalog doesn't use the existing purge code from HiveCatalog#dropTable for its purge operation is primarily due to performance and storage considerations.

When you use the PURGE option in Hive, it immediately deletes the underlying data files without moving them to a temporary holding area like the HDFS trashcan. This can be crucial for performance, storage, and security reasons, especially when dealing with large datasets or sensitive information1.

However, when Spark SQL performs a DROP TABLE operation with the PURGE clause, it doesn't pass this clause along to the Hive statement that performs the drop table operation behind the scenes. Therefore, the purge behavior isn't applied as expected.

To ensure the purge operation is performed correctly, it's recommended to execute the DROP TABLE operation directly in Hive, for example, through the Hive CLI (command-line interface), rather than through Spark SQL.

Here is the reference: https://docs.cloudera.com/runtime/latest/developing-spark-applications/topics/spark-sql-drop-table-purge-considerations.html

apache / iceberg

Spark-hive catalog drop table XX purge not use purge code in HiveCatalog.dropTable() #11484

Query engine

Question