delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.33k stars 411 forks source link

Expose table alterations under `alter` namespace #1909

Open ion-elgreco opened 12 months ago

ion-elgreco commented 12 months ago

Description

Use Case Eventually, we will have multiple alterations possible on the table, such as setting/unsetting table properties, adding and removing columns and so forth. We can cluster these nicely together under a single namespace called alter. The API will look like this:

DeltaTable().alter.set_table_properties()
DeltaTable().alter.unset_table_properties()
DeltaTable().alter.add_columns()
DeltaTable().alter.change_columns()
DeltaTable().alter.replace_columns()
DeltaTable().alter.add_constraints()
DeltaTable().alter.drop_constraints()

Related Issue(s) https://github.com/delta-io/delta-rs/issues/1663

roeap commented 12 months ago

In principle I have no too strong feeling about bundling some commands under a common property, much like we do for optimize.

Given the name alter though, I would suggest restricting it to things things that can be done via the ALTER TABLE command, as we may want / need to implement that operation at some point.

The way things seem to be going with the Delta Protocol, it seems table features are front and center when it comes to configuring tables. Along with that some configuration is becoming more complex. As such we may consider exposing set_table_properties as a low level (discouraged) API only and instead model this around table features. Something along the lines of

def enable_feature(name: FeatureName, config: dict)...

The advantage may be that is is easier for us to validate the configuration as configuration for specific features may include multiple keys that need to be consistent and (as far as I understand) may even require setting domain metadata at some point.

ion-elgreco commented 12 months ago

@roeap it's mostly inspired from the SQL alter operations: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-alter-table.html

I am not entirely following you on why set_table_proprties should be a low-level API. Because not every table property belongs to a certain feature, right? : https://books.japila.pl/delta-lake-internals/DeltaConfigs/#appendOnly

roeap commented 12 months ago

yes, there is config that is unrelated to features... mainly saying that the config that is related to features should maybe modeled as such ...

ion-elgreco commented 10 months ago

@roeap I am going to start looking to this soon, just want to clarify one thing; For configs that are related to features, should we raise when someone tries to add or remove them in set table propeties way?

dtheodor commented 3 months ago

Does this issue cover adding support for DDL statements in general, such as CREATE TABLE and ALTER TABLE ...? Currently only possible with spark.

ion-elgreco commented 3 months ago

Does this issue cover adding support for DDL statements in general, such as CREATE TABLE and ALTER TABLE ...? Currently only possible with spark.

Create table is already covered with the create operation.

Some alter operations are already available, this is still a work in progress to add more, such as add columns operation

dtheodor commented 3 months ago

Complete support for alter operations would make this project useful for lightweight migrations, omitting the need for a spark cluster to perform them