Open nikhilindikuzha opened 1 year ago
Looks like the behavior changed in https://github.com/apache/hudi/commit/6fee77b76fa25be677b324804eae9801ca6b9f4c
I think we should keep the complex keygen while overriding timestamp-type handling.
cc @alexeykudinkin @xiarixiaoyao Any idea why did we remove ComplexKeyGenerator
as base class?
while we are at it. can you also explore how this change would work for existing tables. for eg, with 0.11.0 if a user did not explicitly set the key gen, it would pick ComplexKeyGen. and if the user upgrades to 0.12.1, wouldn't the default will be chosen as SimpleKeyGen. or is that taken care of already?
Attempted to reproduce with below script (note that I created table with 0.11.1 and did insert with 0.12.1), the issue described by the OP is reproducible but upgrade doesn't override the keygen prop in hoodie.properties.
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.DataSourceWriteOptions
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceReadOptions
import org.apache.hudi.QuickstartUtils._
import org.apache.spark.sql.DataFrame
spark.sql("""create table f_test111(
id string,
name string,
age string,
salary string,
upd_ts string,
job string
)
using hudi
partitioned by (job)
location 'file:///tmp/f_test111/'
options (
type = 'cow',
primaryKey = 'id,name',
preCombineField = 'upd_ts'
)""")
spark.sql("""insert into f_test111 values('a1', 'sagar', '32', '1000', '100', 'se')""")
I put this test in TestCreateTable.scala and it succeeds without error in both 0.12.1 release branch and the current master
test("Test Multiple Primary Key Default Keygen") {
withTempDir { tmp =>
val tableName = generateTableName
spark.sql(
s"""
|create table $tableName (
| id int,
| name string,
| price double,
| ts long
|) using hudi
| partitioned by (name)
| tblproperties (
| primaryKey = 'id,price',
| preCombineField = 'ts',
| type = 'cow'
| )
| location '${tmp.getCanonicalPath}'
""".stripMargin)
val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier(tableName))
val tablePath = table.storage.properties("path")
val metaClient = HoodieTableMetaClient.builder()
.setBasePath(tablePath)
.setConf(spark.sessionState.newHadoopConf())
.build()
val tableConfig = metaClient.getTableConfig.getProps.asScala.toMap
assertResult("org.apache.hudi.keygen.ComplexKeyGenerator")(tableConfig(HoodieTableConfig.KEY_GENERATOR_CLASS_NAME.key()))
val source = scala.io.Source.fromFile(tmp.getCanonicalPath + "/.hoodie/hoodie.properties")
val lines = try source.mkString finally source.close()
assertResult(lines.contains("hoodie.table.recordkey.fields=id,price"))(true)
assertResult(lines.contains("hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator"))(true)
}
}
I just tried out my test with options instead of tblproperties and it still passed. So not sure what else there is to try
@nikhilindikuzha : looks like we could not reprod. could you give us a reproducible script/runbook. Feel free to close if you could not reproduce.
@nikhilindikuzha : any updates here please.
Hi Team, whenever I am trying to create hudi table with mutiple primary key , generating : hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator in hudi .12.1 but the same create statement in hudi .11.1 , keygenerator class as complex one. Any idea ?
Sample code:
` spark-shell \ --packages org.apache.hudi:hudi-spark-bundle2.12:0.12.1 \ --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \ --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' import org.apache.spark.sql.SaveMode. import org.apache.hudi.DataSourceWriteOptions. import org.apache.hudi.DataSourceWriteOptions import org.apache.hudi.DataSourceReadOptions. import org.apache.hudi.DataSourceReadOptions import org.apache.hudi.QuickstartUtils._ import org.apache.spark.sql.DataFrame
spark.sql("""create table.f_test5(
id string,
name string,
age string,
salary string,
upd_ts string,
job string
)
using hudi
partitioned by (job)
location 'gs:///HUDI/f_test5/'
options (
type = 'cow',
primaryKey = 'id,name',
preCombineField = 'upd_ts'
)""")
Colla`
Discussion in Hudi Slack channel: https://apache-hudi.slack.com/archives/C4D716NPQ/p1668596344997409