delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.22k stars 1.62k forks source link

[BUG][Spark] Logs are not compacted #3245

Closed Minashraf closed 2 weeks ago

Minashraf commented 2 weeks ago

Bug

Which Delta project/connector is this regarding?

Describe the problem

Logs don't seem to be deleted after retention or after checkpoint

Steps to reproduce

I am using zeppelin notebook and loading my data from HDFS do an update I ran this command many times and I have a lot of checkpoints on hdfs but none of the logs are deleted image Here is my table description image

Observed results

None of the logs are deleted +100 files

Expected results

older logs to be deleted especially the ones before the checkpoint

Further details

Environment information

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

Minashraf commented 2 weeks ago

Found the solution minimum retention is 1 day and can't be converted to hours or minutes