apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 956 forks source link

[spark] Add rollback ddl syntax #4462

Closed askwang closed 2 weeks ago

askwang commented 2 weeks ago

Purpose

Add rollback to snapshot/tag/timestamp ddl syntax.

alter table t rollback to snapshot `2`;
alter table t rollback to tag `test-tag`;
alter table t rollback to timestamp `1730876906134`;

Tests

API and Format

Documentation

LinMingQiang commented 2 weeks ago

Why do we need a new alter ddl syntax. Is it the same as the call procedure ?

askwang commented 2 weeks ago

Why do we need a new alter ddl syntax. Is it the same as the call procedure ?

SQL syntax support will be provided later, it is the same as procedure call.

LinMingQiang commented 2 weeks ago

Why do we use alter table instead of call if they are the same? And alter table should be used to modify the table metadata information, like alter table add col.

askwang commented 2 weeks ago

Why do we use alter table instead of call if they are the same? And alter table should be used to modify the table metadata information, like alter table add col.

I think tag/snapshot are also part of the table meta info, maybe we can use alter table to modify it.

JingsongLi commented 2 weeks ago

Why do we use alter table instead of call if they are the same? And alter table should be used to modify the table metadata information, like alter table add col.

I agree with your idea. I have checked and currently no other lake formats have implemented custom syntax here. We can consider temporarily stopping and using CALL is not a bad idea.

@askwang What do you think?

Zouxxyy commented 2 weeks ago

In my mind, rollback should actually generate a new snapshot containing the changes, from this perspective, alter table should not be used. (although the current implementation directly deletes the existing snapshots).

Besides in spark doc description: ALTER TABLE statement changes the schema or properties of a table.

askwang commented 2 weeks ago

Why do we use alter table instead of call if they are the same? And alter table should be used to modify the table metadata information, like alter table add col.

I agree with your idea. I have checked and currently no other lake formats have implemented custom syntax here. We can consider temporarily stopping and using CALL is not a bad idea.

@askwang What do you think?

ok