apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.23k stars 2.39k forks source link

[SUPPORT] OverwriteWithLatestAvroPayload could not combine record by precombineKey #10813

Closed xuzifu666 closed 4 months ago

xuzifu666 commented 4 months ago

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

1. test("Test table type name first merge test") { withRecordType()(withTempDir { tmp => val targetTable = generateTableName val tablePath = s"${tmp.getCanonicalPath}/$targetTable".replaceAll("\\", "\/") spark.sql( s""" |create table ${targetTable} ( | id string, | name string, | ts bigint, | day STRING, | hour INT |) using hudi |tblproperties ( | 'primaryKey' = 'id', | 'type' = 'mor', | 'preCombineField'='ts', | 'hoodie.index.type' = 'BUCKET', | 'hoodie.bucket.index.hash.field' = 'id', | 'hoodie.bucket.index.num.buckets'=512 | ) partitioned by (day,hour) location '${tablePath}' """.stripMargin)

spark.sql(
  s"""
     |insert into ${targetTable}
     |select '1' as id, 'aa' as name, 123 as ts, '2024-02-19' as `day`, 10 as `hour`
     |""".stripMargin)

spark.sql(
  s"""
     |insert into ${targetTable}
     |select '1' as id, 'bb' as name, 12 as ts, '2024-02-19' as `day`, 10 as `hour`
     |""".stripMargin)

checkAnswer(s"select id, name, ts, day, hour from $targetTable limit 10")(
  Seq("1", "aa", 123, "2024-02-19", 10),
)

}) }

  1. the ut would error
  2. From debug guess parquet base record did not convert to payload to compute with record logs

1709565381819.png

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

xuzifu666 commented 4 months ago

@danny0405

danny0405 commented 4 months ago

Does DefaultHoodieRecordPayload wotk here?

xuzifu666 commented 4 months ago

DefaultHoodieRecordPayload can work well,but hudi default payload is OverwriteWithLatestAvroPayload,from communication master had changed it. Thanks @danny0405

ad1happy2go commented 4 months ago

@xuzifu666 OverwriteWithLatestAvroPayload is supposed to work like this only.

DefaultHoodieRecordPayload also honours the preCombine value while merging an incoming record with that’s in storage, while OverwriteWithLatestAvroPayload will blindly choose the incoming over anything that’s in storage.

Refer - https://medium.com/@simpsons/curious-case-of-defaulthoodierecordpayload-vs-default-payload-class-in-hudi-efbfa423c48e

xuzifu666 commented 4 months ago

@ad1happy2go @danny0405 All had been resolved,Thanks and close the issue