MrPowers / mack

Delta Lake helper methods in PySpark
https://mrpowers.github.io/mack/
MIT License
286 stars 42 forks source link

Brainstorm Python interface for ALTER TABLE #110

Open MrPowers opened 1 year ago

MrPowers commented 1 year ago

ALTER TABLE is currently only exposed via the SQL interface.

It'd be nice to run ALTER TABLE with Python code.

Take a look at the code from this blog post for example:

ALTER TABLE delta.`/tmp/delta-table/` ADD COLUMNS (blah string)

There is already this syntax for creating a Delta table:

deltaTable = (DeltaTable.create(sparkSession)
    .tableName("testTable")
    .addColumn("c1", dataType = "INT", nullable = False)
    .addColumn("c2", dataType = IntegerType(), generatedAlwaysAs = "c1 + 1")
    .partitionedBy("c1")
    .execute())

Perhaps we could use this syntax for altering a Delta table:

(mack.alter(delta_table)
    .addColumn("blah", dataType = "string", nullable = False)
    .execute())
dennyglee commented 1 year ago

Should we overload it to handle modifying an existing column as well?

jaceklaskowski commented 1 year ago

Please don't and make a PR to Delta Lake's DeltaTable instead 🙏

https://github.com/delta-io/delta/issues/1656

MrPowers commented 1 year ago

@jaceklaskowski - thanks for commenting and I agree that this would be better in DeltaTable instead.

The Python API for adding constraints would probably be better as an official API as well.

Perhaps we can add these as experimental APIs here in mack to allow for quick iteration? We could even make the import something like import mack.experimental.alter to make it extra clear. Of course we can just skip all this work and go with what's added to Delta Lake itself if the issue you created will be completed in the short term.