apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.17k stars 2.15k forks source link

insert to hive table with icberg table format is failing #7840

Open infa-ibannatt opened 1 year ago

infa-ibannatt commented 1 year ago

Hi folks, I have standalone hadoop 3.3.5 cluster, and hive 3.1.3 installed and copied iceberg-hive-runtime.1.3.0.jar in the hive lib folder. launched hive session and added the hive-runtime-jar in hive session /bin/hive add jar hdfs:///tmp/jars/iceberg-hive-runtime-1.3.0.jar; added below env in hive session SET iceberg.catalog.another_hive.type=hive; SET iceberg.catalog.another_hive.uri=thrift://localhost:9083; SET iceberg.catalog.another_hive.clients=10; SET iceberg.catalog.another_hive.warehouse=hdfs://localhost:9000/user/hive/warehouse; SET iceberg.engine.hive.enabled=true;

CREATE TABLE test_ic (id INT, name STRING) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'; OK Time taken: 1.979 seconds

hive> insert into test_ic values(1,'test'); Query ID = hadoop_20230614224133_3b760621-ac21-4d9f-9b0d-046fd85196a7 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1686759816002_0006, Tracking URL = http://localhost:8088/proxy/application_1686759816002_0006/ Kill Command = /export/home/hadoop/hadoop/bin/mapred job -kill job_1686759816002_0006 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0 2023-06-14 22:41:42,395 Stage-2 map = 0%, reduce = 0% 2023-06-14 22:41:48,681 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 7.82 sec MapReduce Total cumulative CPU time: 7 seconds 820 msec Ended Job = job_1686759816002_0006 with errors Error during job, obtaining debugging information... Examining task ID: task_1686759816002_0006_m_000000 (and more) from job job_1686759816002_0006 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-2: Map: 1 Cumulative CPU: 7.82 sec HDFS Read: 188220 HDFS Write: 2971 FAIL Total MapReduce CPU Time Spent: 7 seconds 820 msec

error in application log Job commit failed: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:320) at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:214) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:207) at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NoClassDefFoundError: com/facebook/fb303/FacebookService$Iface at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:473) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:146) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60) at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72) at org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63) ... 24 more Caused by: java.lang.ClassNotFoundException: com.facebook.fb303.FacebookService$Iface at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 50 more

pvary commented 1 year ago

Are you using HiveCli, or BeeLine?

infa-ibannatt commented 1 year ago

i am using hiveCli

lurnagao-dahua commented 1 year ago

Hi, I have the same problem and sloved in https://issues.apache.org/jira/browse/HIVE-24498

sickdatascientist commented 10 months ago

@infa-ibannatt I am experiencing the same issue with pseudo-distributed hadoop 3.3.6 cluster, and hive 3.1.3 installed and iceberg-hive-runtime.1.4.1.jar. What I noticed just as you, is for this insert, reducers is 0. For none iceberg managed tables this has value at least 1. Did you find any fix for this?

image

whymed commented 10 months ago

My stack: Hadoop 3.3.6, Hive 3.1.3 and Iceberg hive runtime 1.4.2 I want to use the same default catalog from hive, so I have not configured any of iceberg.catalog mentioned on the documentation.

I think I may be experiencing the same error, I can create a table whitout issues, like so: CREATE TABLE ice2 (id INT, name STRING) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';

But on the insert is when I get this error: hive> INSERT INTO ice2 VALUES (1, 'uno'); Query ID = ml_20231111001329_1d3bf493-9a52-466a-96e0-5832d8662276 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1699652020892_0002, Tracking URL = http://master:8088/proxy/application_1699652020892_0002/ Kill Command = /home/ml/hadoop//bin/mapred job -kill job_1699652020892_0002 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0 2023-11-11 00:13:45,712 Stage-2 map = 0%, reduce = 0% 2023-11-11 00:13:54,113 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.01 sec MapReduce Total cumulative CPU time: 4 seconds 10 msec Ended Job = job_1699652020892_0002 with errors Error during job, obtaining debugging information... Examining task ID: task_1699652020892_0002_m_000000 (and more) from job job_1699652020892_0002 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-2: Map: 1 Cumulative CPU: 4.01 sec HDFS Read: 187919 HDFS Write: 2952 FAIL Total MapReduce CPU Time Spent: 4 seconds 10 msec

On YARN the logs for the app show this: Job commit failed: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:320) at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:214) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:207) at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NoClassDefFoundError: com/facebook/fb303/FacebookService$Iface at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:473) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:146) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60) at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72) at org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63) ... 24 more Caused by: java.lang.ClassNotFoundException: com.facebook.fb303.FacebookService$Iface at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 50 more

g10ck commented 2 months ago

You can make such hook in hive/beeline. For example, hive 3.1.3: add jar hive_home_directory/lib/libfb303-0.9.3.jar; insert into ... ;

It will wotk. But the right solution was mentioned by @lurnagao-dahua