apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
875 stars 292 forks source link

[Feature][Flink] Introducing the INSERT OVERWRITE statement for mixed-streaming format tables. #4

Open YesOrNo828 opened 2 years ago

YesOrNo828 commented 2 years ago

Search before asking

What would you like to be improved?

Currently, the insert overwrite statement is supported for mixed-streaming format tables without primary key specification. In order to meet the batch processing capability of the Flink engine on keyed tables.

Mixed-streaming format tables should include mixed-iceberg and mixed-hive format tables.

INSERT OVERWRITE [catalog_name.][db_name.]table_name [column_list] select_statement

column_list:
  (col_name1 [, column_name2, ...])

OVERWRITE

INSERT OVERWRITE will overwrite any existing data in the table or partition. Otherwise, new data is appended.

COLUMN LIST

Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is that ‘x’ is written to column ‘c’ and ‘y’ is written to column ‘b’ and ‘a’ is set to NULL (assuming column ‘a’ is nullable).

How should we improve?

Flink API should implement the interface: SupportsOverwrite;

This feature only works in flink batch runtime mode.

Affected Flink versions: flink1.12/flink1.14/flink1.15.

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

czy006 commented 8 months ago

@xujiangfeng001 I wonder if the work is still moving forward?

xujiangfeng001 commented 8 months ago

@xujiangfeng001 I wonder if the work is still moving forward?

Hi @czy006 , I'm very sorry, I don't have time to continue advancing this issue recently. Can you help me push it forward ?