delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 364 forks source link

Expose table alterations under `alter` namespace #1909

Open ion-elgreco opened 7 months ago

ion-elgreco commented 7 months ago

Description

Use Case Eventually, we will have multiple alterations possible on the table, such as setting/unsetting table properties, adding and removing columns and so forth. We can cluster these nicely together under a single namespace called alter. The API will look like this:

DeltaTable().alter.set_table_properties()
DeltaTable().alter.unset_table_properties()
DeltaTable().alter.add_columns()
DeltaTable().alter.change_columns()
DeltaTable().alter.replace_columns()
DeltaTable().alter.add_constraints()
DeltaTable().alter.drop_constraints()

Related Issue(s) https://github.com/delta-io/delta-rs/issues/1663

roeap commented 7 months ago

In principle I have no too strong feeling about bundling some commands under a common property, much like we do for optimize.

Given the name alter though, I would suggest restricting it to things things that can be done via the ALTER TABLE command, as we may want / need to implement that operation at some point.

The way things seem to be going with the Delta Protocol, it seems table features are front and center when it comes to configuring tables. Along with that some configuration is becoming more complex. As such we may consider exposing set_table_properties as a low level (discouraged) API only and instead model this around table features. Something along the lines of

def enable_feature(name: FeatureName, config: dict)...

The advantage may be that is is easier for us to validate the configuration as configuration for specific features may include multiple keys that need to be consistent and (as far as I understand) may even require setting domain metadata at some point.

ion-elgreco commented 7 months ago

@roeap it's mostly inspired from the SQL alter operations: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-alter-table.html

I am not entirely following you on why set_table_proprties should be a low-level API. Because not every table property belongs to a certain feature, right? : https://books.japila.pl/delta-lake-internals/DeltaConfigs/#appendOnly

roeap commented 7 months ago

yes, there is config that is unrelated to features... mainly saying that the config that is related to features should maybe modeled as such ...

ion-elgreco commented 6 months ago

@roeap I am going to start looking to this soon, just want to clarify one thing; For configs that are related to features, should we raise when someone tries to add or remove them in set table propeties way?