delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
6.96k stars 1.58k forks source link

[WIP][Spark] Python DeltaTableBuilder API for Identity Columns #3044

Open c27kwan opened 2 weeks ago

c27kwan commented 2 weeks ago

Which Delta project/connector is this regarding?

Description

This PR is part of https://github.com/delta-io/delta/issues/1959

In this PR, we extend the addColumn interface in DeltaTableBuilder to allow for Identity Columns creation.

Resolves https://github.com/delta-io/delta/issues/1072

How was this patch tested?

New tests.

Does this PR introduce any user-facing changes?

We introduce three new parameters to the addColumn method: generatedAlwaysAsIdentity, identityStart, and identityStep, which can be used to specify Identity Columns that are GENERATED BY DEFAULT and GENERATED ALWAYS.

Interface

def addColumn(
        self,
        colName: str,
        dataType: Union[str, DataType],
        nullable: bool = True,
        generatedAlwaysAs: Optional[str] = None,
        generatedAlwaysAsIdentity: Optional[bool] = None,
        identityStart: Optional[int] = None,
        identityStep: Optional[int] = None,
        comment: Optional[str] = None,
) -> "DeltaTableBuilder"

Example Usage

 DeltaTable.create()
    .tableName("tableName")
    .addColumn("id", dataType=LongType(), generatedAlwaysAsIdentity=true, identityStart = 0, identityStep = 1)
    .execute()