delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.19k stars 392 forks source link

regression : delta.logRetentionDuration don't seems to be respected #2447

Closed djouallah closed 5 months ago

djouallah commented 5 months ago

Environment

Delta-rs version: 17.1

Binding:

Environment: Python


Bug

What happened:

write_deltalake(delta_path, df,configuration = {"delta.logRetentionDuration": "interval 1 day"} ,mode="append",storage_options=storage_options)
dt = DeltaTable(delta_path,storage_options=storage_options)
dt.vacuum(retention_hours=0,dry_run=False,  enforce_retention_duration=False)
dt.create_checkpoint()
dt.cleanup_metadata()

don't seems to be working ?

echai58 commented 5 months ago

I think the configuration key should be delta.logRetentionDuration.

djouallah commented 5 months ago

same issue

ion-elgreco commented 5 months ago

I think the configuration key should be delta.logRetentionDuration.

Correct

djouallah commented 5 months ago

@ion-elgreco delta.logRetentionDuration does not work either ?

ion-elgreco commented 5 months ago

@djouallah it does work, you provided interval 1 day, so you can't expect the logs to be deleted immediately : P, change it to 1 seconds interval and you can see they get removed

import polars as pl
from deltalake import DeltaTable, write_deltalake

df = pl.DataFrame({"foo": [1]})
delta_path = "test_Table"

write_deltalake(
    delta_path,
    df.to_arrow(),
    configuration={"delta.logRetentionDuration": "interval 1 seconds"},
    mode="overwrite",
)
dt = DeltaTable(delta_path)
dt.vacuum(retention_hours=0, dry_run=False, enforce_retention_duration=False)
dt.create_checkpoint()
dt.cleanup_metadata()
djouallah commented 5 months ago

no luck, I am writing to gcp fwiw

image
ion-elgreco commented 5 months ago

@djouallah please share the table configuration in the delta log

djouallah commented 5 months ago

Metadata(id: 62adbf63-1e61-479e-8187-8fd7ef308b5c, name: None, description: None, partition_columns: [], created_time: 1713594980946, configuration: {})

ion-elgreco commented 5 months ago

Metadata(id: 62adbf63-1e61-479e-8187-8fd7ef308b5c, name: None, description: None, partition_columns: [], created_time: 1713594980946, configuration: {})

Yeah, you didn't pass a configuration during creating so it's using the default of 30 days.

djouallah commented 5 months ago

ah, I see it has to be in the first time it was created, adding the option later using append or overwrite does not works, thanks !!!