apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.35k stars 2.42k forks source link

[SUPPORT] Unable to Set Database Name for Hive Metadata Sync using Flink SQL #10583

Closed vkhoroshko closed 8 months ago

vkhoroshko commented 8 months ago

Describe the problem you faced

Using Flink SQL it's not possible to change Database Name for Hive Sync.

To Reproduce

Steps to reproduce the behavior:

  1. Try to use Flink SQL to sink data to Hudi and metadata to Hive
    CREATE TABLE IF NOT EXISTS test_tbl
    (
    `eventDate`       TIMESTAMP(3),
    `operationType`   STRING
    ) WITH (
    'connector' = 'hudi',
    'path' = 'file:///opt/flink/hudi',
    'table.type' = 'COPY_ON_WRITE',
    'write.operation'='insert',
    'write.parquet.block.size'='1',
    'write.parquet.max.file.size'='5',
    'hoodie.database.name' = 'testdb',
    'hoodie.table.name' = 'test_tbl',
    'hive_sync.enable' = 'true',
    'hive_sync.mode' = 'hms',
    'hive_sync.metastore.uris' = 'thrift://172.17.0.1:9083'
    'hive_sync.database' = 'testdb',
    );

    Despite that both hoodie.database.name and hive_sync.database are set to 'testdb' - it still attempts to create default_database in Hive. Looking at the generated .hoodie/hoodie.properties I still see hoodie.database.name=default_database

Expected behavior Hudi metadata is synced to Hive Metastore testdb database.

A clear and concise description of what you expected to happen.

Environment Description

Using hudi-flink1.17-bundle-0.14.1.jar

Additional context

Whenever Hive Sync is configured in Flink SQL - it tries to create default_database and there is no way to change it,

Stacktrace


2024-01-29 15:36:25,611 ERROR org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Executor executes action [sync hive metadata for instant 20240129153624064] error
org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing test_tbl
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:171) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.sink.StreamWriteOperatorCoordinator.doSyncHive(StreamWriteOperatorCoordinator.java:342) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
    at java.lang.Thread.run(Unknown Source) [?:?]

Caused by: org.apache.hudi.hive.HoodieHiveSyncException: failed to create table test_tbl
    at org.apache.hudi.hive.ddl.HMSDDLExecutor.createTable(HMSDDLExecutor.java:140) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HoodieHiveSyncClient.createTable(HoodieHiveSyncClient.java:235) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.syncFirstTime(HiveSyncTool.java:332) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:254) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:180) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:168) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    ... 5 more
Caused by: org.apache.hadoop.hive.metastore.api.InvalidObjectException: default_database
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:54899) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:54876) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:54802) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1556) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1542) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2867) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:121) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:837) ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:822) ~[hive-exec-3.1.3.jar:3.1.3]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) ~[hive-exec-3.1.3.jar:3.1.3]
    at com.sun.proxy.$Proxy50.createTable(Unknown Source) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
    at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2773) ~[hive-exec-3.1.3.jar:3.1.3]
    at com.sun.proxy.$Proxy50.createTable(Unknown Source) ~[?:?]
    at org.apache.hudi.hive.ddl.HMSDDLExecutor.createTable(HMSDDLExecutor.java:137) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HoodieHiveSyncClient.createTable(HoodieHiveSyncClient.java:235) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.syncFirstTime(HiveSyncTool.java:332) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:254) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:180) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:168) ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
    ... 5 more```
ad1happy2go commented 8 months ago

@vkhoroshko This is a table property and we can't change any of the table properties.

vkhoroshko commented 8 months ago

@vkhoroshko This is a table property and we can't change any of the table properties.

But docs state that this should be possible - https://hudi.apache.org/docs/configurations/#FLINK_SQL Also, running HiveSyncTool directly does accept --database option and it's working.

vkhoroshko commented 8 months ago

The correct option for Flink SQL is "hive-sync.db" and not "hive-sync.database".

Closing the ticket.