Open njalan opened 11 months ago
@njalan Do your partition column in data contains NULLS? When are you facing this error? Looks like you are trying to add the null partition. It may not be hudi related but more of hive related issue.
You may try - ALTER TABLE ods_xxx.xx ADD IF NOT EXISTS PARTITION (xx=null) LOCATION 'xxxx/HIVE_DEFAULT_PARTITION'
Hi @danny0405 @ad1happy2go @xushiyan When we upgraded from 0.11.1 to 0.14.0, the default partition value was changed from default to HIVE-DEFAULT-PARTION, which caused two problems:
We hope that the upgrade of Hudi should not affect the business, and configuration items should be provided for users to choose from. This is very unfriendly to users now😅
Thanks for the feedback @CaesarWangX , do you try HMS as the sync mode then, the 1st is unexpected and should be a bug, the motive is to keep sync with Hive for default partition name, but now it causes problems reported by Hive.
For the 2nd, there might be no easy way to be compatible with history data set because partition path is hotspot code path and we might not consider the ramifications for history values for each record. If you uses the Flink for ingestion, there is config option named partition.default_name
to switch to other default value as needed.
@ad1happy2go , would like to see if you have time to make it clear whether the 1st issues is only limited for JDBC sync, which is already deprecated anyway.
Thanks @danny0405. 1.Our configuration does not explicitly set hoodie.datasource.hive_sync.mode. After enabling hive sync, we set hoodie.datasource.hive_sync.jdbcurl 2.Unfortunately, we are using Spark😅, and upon checking the code, I found that this part is hard code and cannot specify a value for the default partition. If I'm wrong, please correct me.
Actually, the first issue is not the main one. We are more concerned with the issue of default partition values
if possible, maybe you can fire a JIRA issue and contribute code to make the spark default partition value to be configurable, and I will be gald to review it.
@danny0405 sure, I can do that.
I got below error message:
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER TABLE
ods_xxx
.xx
ADD IF NOT EXISTS PARTITION (xx
='HIVE_DEFAULT_PARTITION') LOCATION 'xxxx/HIVE_DEFAULT_PARTITION' at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:70) at org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.lambda$addPartitionsToTable$0(QueryBasedDDLExecutor.java:124) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.addPartitionsToTable(QueryBasedDDLExecutor.java:124) at org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:109) at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:445) at org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:399) ... 69 more Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10111]: Partition value contains a reserved substring (User value: HIVE_DEFAULT_PARTITION Reserved substring: HIVE_DEFAULT_PARTITION)Environment Description
Hudi version : 0.13.1
Spark version : 3.0.1
Hive version : 3.1
Hadoop version : 3.2.2
Storage (HDFS/S3/GCS..) :
Running on Docker? no :