Open jorritsandbrink opened 3 weeks ago
Two flaws exist with pipeline refresh for delta table format on filesystem destination:
refresh
delta
filesystem
drop_sources
drop_data
Repro (1):
import dlt from dlt.destinations import filesystem from tests.pipeline.utils import airtable_emojis source = airtable_emojis().with_resources("📆 Schedule", "🦚Peacock") for resource in source.selected_resources.values(): resource.apply_hints(table_format="delta") pipe = dlt.pipeline( pipeline_name="refresh_repro", pipelines_dir="_storage", destination=filesystem("_storage") ) pipe.run(source) pipe.run(source.with_resources("🦚Peacock"), refresh="drop_sources") # actual: empty folder `/_schedule/_delta_log` remains # expected: `/_schedule/_delta_log` no longer exists
Repro (2):
import dlt from dlt.destinations import filesystem from tests.pipeline.utils import airtable_emojis source = airtable_emojis().with_resources("📆 Schedule", "🦚Peacock") for resource in source.selected_resources.values(): resource.apply_hints(table_format="delta") pipe = dlt.pipeline( pipeline_name="refresh_repro", pipelines_dir="_storage", destination=filesystem("_storage") ) pipe.run(source) pipe.run(source.with_resources("📆 Schedule"), refresh="drop_data") # actual: _schedule table has single commit (/_schedule/_delta_log/00000000000000000000.json) (in SQL terms: table got DROPped) # expected: _schedule table has two commits (in SQL terms: table got TRUNCATEd)
Yes, I'm already a dlt user.
No response
Custom implementations for drop_tables and truncate_tables for delta. Currently generic filesystem implementations are applied.
drop_tables
truncate_tables
https://github.com/dlt-hub/dlt/pull/1742#issuecomment-2310481736
Feature description
Two flaws exist with pipeline
refresh
fordelta
table format onfilesystem
destination:drop_sources
.drop_data
.Repro (1):
Repro (2):
Are you a dlt user?
Yes, I'm already a dlt user.
Use case
No response
Proposed solution
Custom implementations for
drop_tables
andtruncate_tables
fordelta
. Currently genericfilesystem
implementations are applied.Related issues
https://github.com/dlt-hub/dlt/pull/1742#issuecomment-2310481736