apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.16k stars 855 forks source link

[Feature] Paimon Spark Extensions conflict with Iceberg #3212

Open wForget opened 3 months ago

wForget commented 3 months ago

Search before asking

Motivation

The Call syntax is defined in both Iceberg and Paimon, which may cause conflicts when I introduce their SparkSessionExtensions at the same time.

Reproduce:

spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

-- create iceberg table
CREATE TABLE iceberg_catalog.sample.iceberg_t1 (
    user_id BIGINT,
    item_id BIGINT,
    behavior STRING,
    dt STRING,
    hh STRING
) using iceberg;

-- create paimon table
CREATE TABLE paimon_catalog.sample.paimon_t1 (
    user_id BIGINT,
    item_id BIGINT,
    behavior STRING,
    dt STRING,
    hh STRING
) TBLPROPERTIES (
    'primary-key' = 'dt,hh,user_id'
);

-- Successed
CALL iceberg_catalog.system.remove_orphan_files(table => "sample.iceberg_t1");

-- Failed, use iceberg ResolveProcedures
CALL paimon_catalog.sys.remove_orphan_files(table => "sample.paimon_t1");

duplicate: https://github.com/apache/iceberg/issues/10143

Solution

If currentCatalog is not Spark Session catalog and Paimon Spark Catalog, we first use delegate parser to parse sqlText in PaimonSparkSqlExtensionsParser#parsePlan.

Anything else?

No response

Are you willing to submit a PR?

zhongyujiang commented 1 month ago

Hey @wForget, I came across the same issue, have you got this fixed?

wForget commented 1 month ago

Hey @wForget, I came across the same issue, have you got this fixed?

I determine which spark extensions to use in the front-end sql gateway service.

zhongyujiang commented 1 month ago

I determine which spark extensions to use in the front-end sql gateway service.

@wForget So, you didn't combine these two extension functionalities, but used them separately, and routed the queries between them, is my understanding correct?

wForget commented 1 month ago

I determine which spark extensions to use in the front-end sql gateway service.

@wForget So, you didn't combine these two extension functionalities, but used them separately, and routed the queries between them, is my understanding correct?

Yes, we currently have no need to mix them.

zhongyujiang commented 1 month ago

@wForget Thanks for replying