Closed polivbr closed 5 days ago
@polivbr please create a reproducible example
Here you go:
import polars as pl
import deltalake as dl
df = pl.DataFrame({'a': [1, 2, 3, 4], 'b': [1, 1, 2, 2], 'c': [10, 11, 12, 13]})
df.write_delta("test_table")
df2 = pl.DataFrame({'a': [100, 200, 300], 'b': [1, 1, 1]})
df2.write_delta(
"test_table",
mode="overwrite",
delta_write_options={
"predicate": "b = 1",
"schema_mode": "merge",
"engine": "rust"
}
)
table = dl.DeltaTable("test_table")
schema = table.schema()
print(schema)
# OUTPUT:
# Schema([Field(a, PrimitiveType("long"), nullable=True), Field(b, PrimitiveType("long"), nullable=True)])
#
# Note that Field c is absent
Environment
Delta-rs version:
0.17.4
Binding:
Python
Bug
What happened:
I attempted to update a table from a Polars DataFrame with mode="overwrite" and a predicate to use for replacement. The DataFrame had a subset of the columns that are in the table. While the rows matching the predicate are successfully replaced with the new data, the table's schema becomes the schema of the DataFrame, rather than being merged with the existing schema.
What you expected to happen:
The original table schema is preserved.
How to reproduce it:
1) Create a table with a set of columns 2) Write to that same table with: