apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.46k stars 2.43k forks source link

Hive sync error with <class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHoo> #533

Closed louisliu318 closed 5 years ago

louisliu318 commented 5 years ago

Environment: spark-2.3.2 hadoop-2.7.3 hive-1.2.1

Error: I am using spark datasource api to insert data into hoodie table and sync to hive. Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHook at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2227) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.loadFilterHooks(HiveMetaStoreClient.java:240) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:192) at com.uber.hoodie.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:181) at com.uber.hoodie.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:102) at com.uber.hoodie.hive.HiveSyncTool.<init>(HiveSyncTool.java:61) at com.uber.hoodie.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:246) at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:179) at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:106) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) at com.lianjia.dtarch.databus.hudi.HudiBatchSync.execute(HudiBatchSync.java:85) at com.lianjia.dtarch.databus.hudi.HudiBatchSync.main(HudiBatchSync.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException: class org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl not com.uber.hoodie.org.apache.hadoop_hive.metastore.MetaStoreFilterHook at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2221) ... 39 more

louisliu318 commented 5 years ago

Removing the s in maven-shade-plugin , the error gone. Maybe need some workaroud in HoodieSparkSqlWriter.syncHive() in which we can set some hive configurations with com.uber.hoodie or set configurations in hive-site.xml.

vinothchandar commented 5 years ago

Hive 1.x.. Are you using the correct spark bundle?

From quickstart

To work with older version of Hive (pre Hive-1.2.1), use

$ mvn clean install -DskipTests -DskipITs -Dhive11

@bvaradar for context

louisliu318 commented 5 years ago

@vinothchandar I'm using Hive-1.2.1, not Hive-1.1.1. In packaging/hoodie-spark-bundle/pom.xml, -Dhive12 points to hive version 1.2.1.

bvaradar commented 5 years ago

@louisliu318 : THanks for filing this ticket. Yes, with Hive-1.2.1, the maven profile is hive12 (default). When I tested similar setup, I did not encounter this issue. This is caused by shading some of hive jars and including them in the bundle (bit of a magic by trial and error).

It is not clear from your comment if you solved this by disabling shading ?

Can you also try shading hive-metastore jar. Add this relocation in the shading section of pom.xml of hoodie-spark -

org.apache.hadoop.hive.metastore. com.uber.hoodie.org.apache.hadoop_hive.metastore.

Let me know if this solves the problem.

louisliu318 commented 5 years ago

I solved the problem by comment out the relocations in the shading section of pom.xml of hoodie-spark - .

vinothchandar commented 5 years ago

can you throw your changes into a PR? Balaji and I discussed this more. He seems to have tested on hive1.2 as a part of the dockerized setup, and things worked for him.. Ideally we need to test this across all hive versions before making a call ... The hive jar versioning is very sensitive and changes made for one version often end up causing side effects for others

vinothchandar commented 5 years ago

@louisliu318 ping again, to see if you can share the changes with us..

louisliu318 commented 5 years ago

@vinothchandar In my environment, I commented out the following code about hive in packaging\hoodie-spark-bundle\hoodie-spark-bundle.iml

            <relocation>
              <pattern>org.apache.hive.jdbc.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hive.jdbc.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.metastore.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.metastore.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hive.common.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hive.common.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.common.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.common.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.conf.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.conf.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hive.service.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hive.service.</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.hadoop.hive.service.</pattern>
              <shadedPattern>com.uber.hoodie.org.apache.hadoop_hive.service.</shadedPattern>
            </relocation>
bvaradar commented 5 years ago

This should be fixed by https://github.com/apache/incubator-hudi/pull/633

n3nash commented 5 years ago

closing this ticket in favor of #633 fixing the underlying issue