apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.4k stars 2.43k forks source link

[SUPPORT] The keygenerator.class value set when using SparkSQL to create a table does not finally take effect in hoodie.properties #7351

Open JoshuaZhuCN opened 1 year ago

JoshuaZhuCN commented 1 year ago

The keygenerator.class value set when using SparkSQL to create a table does not finally take effect in hoodie.properties

e.g: we want to set the value to 'ComplexKeyGenerator' but finally it is 'SimpleKeyGenerator' in hoodie.properties

To Reproduce

Steps to reproduce the behavior:

  1. execute th create table ddl like :
    spark.sql("drop table if exists `default`.`spark_0_12_1_test`")
    spark.sql(
    s"""|
        |CREATE TABLE IF NOT EXISTS `default`.`spark_0_12_1_test` (
        |     `id` INT
        |    ,`name` STRING
        |    ,`age` INT
        |    ,`sync_time` TIMESTAMP
        |) USING HUDI
        |TBLPROPERTIES (
        |     `type` = 'mor'
        |    ,`primaryKey` = 'id'
        |    ,`preCombineField` = 'sync_time'
        |    ,`hoodie.index.bucket.engine` = 'CONSISTENT_HASHING'
        |    ,`hoodie.bucket.index.hash.field` = 'id'
        |    ,`hoodie.index.type` = 'BUCKET'
        |    ,`hoodie.datasource.write.hive_style_partitioning` = 'false'
        |    ,`hoodie.storage.layout.type` = 'BUCKET'
        |    ,`hoodie.storage.layout.partitioner.class` = 'org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner'
        |    ,`hoodie.datasource.write.keygenerator.class` = 'org.apache.hudi.keygen.ComplexKeyGenerator'
        |    ,`hoodie.compaction.payload.class` = 'org.apache.hudi.common.model.OverwriteWithLatestAvroPayload'
        |)
        |COMMENT 'test_0.12.1' 
        |PARTITIONED BY (id) 
        |""".stripMargin
    )
  2. View the hoodie.properties file and find that its value is not the value specified in the DDL image

Environment Description

danny0405 commented 1 year ago

Yeah, the HoodieCatalogTable#initHoodieTable does not copy the option hoodie.table.keygenerator.class like what the sql writer does: HoodieSparkSqlWriter#mergeParamsAndGetHoodieConfig, but I guess it is for design purpose, the write config can override the table config for writer path to init table, but not the case for catalog table creation/modification.

@nsivabalan Can you help double check this ?

JoshuaZhuCN commented 1 year ago

Yeah, the HoodieCatalogTable#initHoodieTable does not copy the option hoodie.table.keygenerator.class like what the sql writer does: HoodieSparkSqlWriter#mergeParamsAndGetHoodieConfig, but I guess it is for design purpose, the write config can override the table config for writer path to init table, but not the case for catalog table creation/modification.

@nsivabalan Can you help double check this ?

If we set a value different from that in hoodie.properties when writing data, an error will be reported image

JoshuaZhuCN commented 1 year ago

Yeah, the HoodieCatalogTable#initHoodieTable does not copy the option hoodie.table.keygenerator.class like what the sql writer does: HoodieSparkSqlWriter#mergeParamsAndGetHoodieConfig, but I guess it is for design purpose, the write config can override the table config for writer path to init table, but not the case for catalog table creation/modification. @nsivabalan Can you help double check this ?

If we set a value different from that in hoodie.properties when writing data, an error will be reported image

i get it, it need not to set this option any more in datasource write mode if ddl has set

xushiyan commented 1 year ago

@jonvex can you look into this pls?

jonvex commented 1 year ago

https://issues.apache.org/jira/browse/HUDI-5262 I reported this a few weeks ago. You need to use hoodie.table.keygenerator.class to set the keygenerator when creating a table in spark-sql

jonvex commented 1 year ago

@xushiyan https://github.com/apache/hudi/pull/7394 here is an example pr. Instead of failing, another option is to set the correct config

nsivabalan commented 1 year ago

@jonvex : can you fix our quick start guide around this please. do create a jira as well.

jonvex commented 1 year ago

PR's are ready for review and then we can close this out