datastrato / gravitino

World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
401 stars 166 forks source link

[#2845] refactor(spark-connector): Refactor the table implementation in spark-connector #2846

Closed caican00 closed 1 month ago

caican00 commented 1 month ago

What changes were proposed in this pull request?

Refactor the table implementation in spark-connector.

Why are the changes needed?

The purpose of inheritance SparkTable is in order to go through the isIcebergTable check in IcebergSparkSqlExtensionsParser.

For details, please refer to: https://github.com/apache/iceberg/blob/main/spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala#L127-L186

Without refactoring, row-level operations cannot be performed.

Fix: #2845

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing ITs.

caican00 commented 1 month ago

Hi @FANNG1 could you help review this PR? Thank you.

caican00 commented 1 month ago

There are two comments left in the discussion.

jerryshao commented 1 month ago

From my cursory review, I feel that the current solution is a little hacky, Inheriting the specific table like Hive Table or Iceberg Table seems strange, what if it cannot be inherited, or if it has lots of scala-ish things make us hard to inherit from Java, so overall it is not so good IMO.

Besides, I think what @FANNG1 mentioned should also be refactored, we can use the compose pattern to handle this.

Anyway, I was wondering if we have better solution to handle this Iceberg limitation, like inject or rewrite Iceberg's extension, rather than doing the change here.

FANNG1 commented 1 month ago

Anyway, I was wondering if we have better solution to handle this Iceberg limitation, like inject or rewrite Iceberg's extension, rather than doing the change here.

this seems more reasonable overall, @caican00 WDYT?

caican00 commented 1 month ago

Anyway, I was wondering if we have better solution to handle this Iceberg limitation, like inject or rewrite Iceberg's extension, rather than doing the change here.

this seems more reasonable overall, @caican00 WDYT?

ok for me. May be we can rewrite the IcebergSparkSqlExtensionsParser, WDYT? @FANNG1 @jerryshao

FANNG1 commented 1 month ago

Anyway, I was wondering if we have better solution to handle this Iceberg limitation, like inject or rewrite Iceberg's extension, rather than doing the change here.

this seems more reasonable overall, @caican00 WDYT?

Yes, it seems more reasonable. May be we can rewrite the IcebergSparkSqlExtensionsParser, WDYT? @FANNG1 @jerryshao

+1

jerryshao commented 1 month ago

Let's try this way, or we can have some rule/parser/strategy extension to handle this problem.

caican00 commented 1 month ago

Let's try this way, or we can have some rule/parser/strategy extension to handle this problem.

Let's try this way, or we can have some rule/parser/strategy extension to handle this problem.

@jerryshao Should this pr be turned off?

jerryshao commented 1 month ago

You can create another PR or update the current PR, either is OK for me.

caican00 commented 1 month ago

You can create another PR or update the current PR, either is OK for me.

ok.

caican00 commented 1 month ago

Anyway, I was wondering if we have better solution to handle this Iceberg limitation, like inject or rewrite Iceberg's extension, rather than doing the change here.

this seems more reasonable overall, @caican00 WDYT?

Yes, it seems more reasonable. May be we can rewrite the IcebergSparkSqlExtensionsParser, WDYT? @FANNG1 @jerryshao

+1

i will add this to the row-level operation PR, WDYT? @FANNG1

FANNG1 commented 1 month ago

i will add this to the row-level operation PR, WDYT? @FANNG1

ok

jerryshao commented 1 month ago

So this is not needed, right? I'm going to close this.