Closed louisliu318 closed 5 years ago
Removing the HoodieSparkSqlWriter.syncHive()
in which we can set some hive configurations with com.uber.hoodie or set configurations in hive-site.xml.
Hive 1.x.. Are you using the correct spark bundle?
From quickstart
To work with older version of Hive (pre Hive-1.2.1), use
$ mvn clean install -DskipTests -DskipITs -Dhive11
@bvaradar for context
@vinothchandar I'm using Hive-1.2.1, not Hive-1.1.1. In packaging/hoodie-spark-bundle/pom.xml
, -Dhive12 points to hive version 1.2.1.
@louisliu318 : THanks for filing this ticket. Yes, with Hive-1.2.1, the maven profile is hive12 (default). When I tested similar setup, I did not encounter this issue. This is caused by shading some of hive jars and including them in the bundle (bit of a magic by trial and error).
It is not clear from your comment if you solved this by disabling shading ?
Can you also try shading hive-metastore jar. Add this relocation in the shading section of pom.xml of hoodie-spark -
Let me know if this solves the problem.
I solved the problem by comment out the relocations in the shading section of pom.xml of hoodie-spark - .
can you throw your changes into a PR? Balaji and I discussed this more. He seems to have tested on hive1.2 as a part of the dockerized setup, and things worked for him.. Ideally we need to test this across all hive versions before making a call ... The hive jar versioning is very sensitive and changes made for one version often end up causing side effects for others
@louisliu318 ping again, to see if you can share the changes with us..
@vinothchandar In my environment, I commented out the following code about hive in packaging\hoodie-spark-bundle\hoodie-spark-bundle.iml
<relocation>
<pattern>org.apache.hive.jdbc.</pattern>
<shadedPattern>com.uber.hoodie.org.apache.hive.jdbc.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.metastore.</pattern>
<shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.metastore.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hive.common.</pattern>
<shadedPattern>com.uber.hoodie.org.apache.hive.common.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.common.</pattern>
<shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.common.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.conf.</pattern>
<shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.conf.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hive.service.</pattern>
<shadedPattern>com.uber.hoodie.org.apache.hive.service.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.hadoop.hive.service.</pattern>
<shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.service.</shadedPattern>
</relocation>
This should be fixed by https://github.com/apache/incubator-hudi/pull/633
closing this ticket in favor of #633 fixing the underlying issue
Environment: spark-2.3.2 hadoop-2.7.3 hive-1.2.1
Error: I am using spark datasource api to insert data into hoodie table and sync to hive.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHook at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2227) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.loadFilterHooks(HiveMetaStoreClient.java:240) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:192) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:181) at com.uber.hoodie.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:102) at com.uber.hoodie.hive.HiveSyncTool.<init>(HiveSyncTool.java:61) at com.uber.hoodie.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:246) at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:179) at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:106) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) at com.lianjia.dtarch.databus.hudi.HudiBatchSync.execute(HudiBatchSync.java:85) at com.lianjia.dtarch.databus.hudi.HudiBatchSync.main(HudiBatchSync.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException: class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHook at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2221) ... 39 more