apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

[Hive] Fix Hive DDL and paimon schema mismatched bug #4561

Open GangYang-HX opened 19 hours ago

GangYang-HX commented 19 hours ago

Purpose

Linked issue: issue-4556

HiveCatalog add column is divided into two stages: first modify the paimon schema, then modify HMS. There will be a time difference between the two stages. If there is a task read (based on hivecatalog) during this time, the paimon schema and hive schema check will fail in the getDataFieldsJsonStr link.

Solution: Actually, you only need to ensure that the Hive DDL is a subset of the paimon schema.

Tests

org.apache.paimon.hive.HiveTableSchemaTest#testSupersetColumnNameAndType org.apache.paimon.hive.HiveTableSchemaTest#testSubsetColumnNameAndType

API and Format

org.apache.paimon.hive.HiveSchema#checkPartitionMatched

Documentation

only need to ensure that the Hive DDL is a subset of the paimon schema.