datastrato / gravitino

World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
401 stars 166 forks source link

[#2543] feat(spark-connector): support row-level operations to iceberg Table #3382

Closed caican00 closed 2 weeks ago

caican00 commented 2 weeks ago

What changes were proposed in this pull request?

1. update tableName set c1=v1, c2=v2, ...

2. merge into targetTable t
   using sourceTable s
   on s.key=t.key
   when matched then ...
   when not matched then ...

3. delete from table where xxx

Why are the changes needed?

  1. For spark-connector in Iceberg, it explicitly uses SparkTable to identify whether it is an Iceberg table, so the SparkIcebergTable must extend SparkTable.

  2. support row-level operations to iceberg Table.

Fix: https://github.com/datastrato/gravitino/issues/2543

Does this PR introduce any user-facing change?

Yes, support update ... , merge into ..., delete from ...

How was this patch tested?

New ITs.

caican00 commented 2 weeks ago

cc @FANNG1

jerryshao commented 2 weeks ago

@FANNG1 please help to review.