delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.49k stars 1.68k forks source link

[BUG][Spark] Delta 3.0.0 UniForm - VACUUM seems to remove Iceberg Metadata #2273

Open KaPrathaban opened 11 months ago

KaPrathaban commented 11 months ago

Bug

Which Delta project/connector is this regarding?

Describe the problem

VACUUM removes the contents of the Iceberg metadata folder created through the UniForm Feature.

Steps to reproduce

Step 1: Create a Delta Table with the Iceberg Integration

CREATE OR REPLACE table testtimestamp 
USING delta 
TBLPROPERTIES ('delta.columnMapping.mode' = 'name' , 'delta.universalFormat.enabledFormats' = 'iceberg', 'delta.feature.timestampNtz' = 'enabled' ) 
COMMENT 'Test Timestamp Columns' 
AS SELECT CAST('2020-09-20 10:00:00.123345' AS TIMESTAMP_NTZ) as test_col;

Step 2: Insert New Records

INSERT INTO testtimestamp VALUES ('2023-02-01 12:00:00.123456'),('2023-02-02 12:00:00.123456'),('2023-02-03 12:00:00.123456')

Step 2: Vacuum with Retention Set to Zero

VACUUM testtimestamp RETAIN 0 HOURS DRY RUN

Observed results

Iceberg Version files(JSON), Snapshot and Manifest AVRO files are also getting deleted.

Expected results

Only PARQUET files, PARTITION folders + appropriate DELTA metadata files should be removed.

Further details (Potential Issue)

Iceberg metadata folder should be treated as a hidden directory for delta-related file operations, such as Vacuum and Fsck.

Environment information

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

lzlfred commented 10 months ago

thanks @KaPrathaban for reporting. this is admittedly an problem and we should fix it. I can make a fix EOD.

lzlfred commented 10 months ago

the fix is up. https://github.com/delta-io/delta/pull/2301/files