apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[HUDI-8504] Fix missing database config when building Hudi configs in Spark #12238

Closed fhan688 closed 6 days ago

fhan688 commented 1 week ago

Change Logs

The current implementation missed database when building hudi config, which will lead to obtain the incorrect table even though the table exists. Such as the case of spark writing data https://github.com/apache/hudi/blob/cea81e82fdeecec4e1d7eb53ae1f8e9eaeede11c/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala#L97

https://github.com/apache/hudi/blob/cea81e82fdeecec4e1d7eb53ae1f8e9eaeede11c/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L251-L254 Because the value of DATABASE_NAME.key is lost when buildHoodieInsertConfig, so databaseName is empty in tableIdentifier. https://github.com/apache/hudi/blob/cea81e82fdeecec4e1d7eb53ae1f8e9eaeede11c/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L667-L670

https://github.com/apache/hudi/blob/cea81e82fdeecec4e1d7eb53ae1f8e9eaeede11c/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L674-L680 Then the table under the databaseName cannot be obtained or the table under the default database is obtained, it is obviously inconsistent with expectation.

this PR fix this bug.

Impact

hudi-spark-common

Risk level (write none, low medium or high below)

Low

Documentation Update

None

Contributor's checklist

danny0405 commented 1 week ago

@fhan688 Thanks for the contribution, can you check the test failures.

hudi-bot commented 1 week ago

CI report:

Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build