Open itamar-otonomo opened 4 years ago
I am also getting this error, except on EMR.
I am seeing this on EMR 6.2.0
I am seeing this on EMR 5.32 when I try the following on the master node:
sudo pyspark --packages org.apache.hudi:hudi-spark-bundle_2.11:0.8.0,org.apache.spark:spark-avro_2.11:2.4.7 --jars /usr/lib/hive/auxlib/aws-glue-datacatalog-hive2-client.jar --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.hive.convertMetastoreParquet=false' --driver-class-path /etc/hive/conf
then any command like:
spark.sql("SELECT count(*) FROM mydb.mytable").show()
... results in the error
We have a similar issue with EMR 6.3.0 when running Spark jobs through Oozie Spark action (spark-submit works ok). This happens when trying to SELECT from Hive table with Spark SQL. Just the signature of the createMetaStoreClient
is different:
2021-08-26 15:17:06,257 [main] ERROR org.apache.spark.deploy.yarn.Client - Application diagnostics message: User class threw exception
: java.lang.AbstractMethodError: com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(Lorg/ap
ache/hadoop/hive/conf/HiveConf;Lorg/apache/hadoop/hive/metastore/HiveMetaHookLoader;ZLjava/util/concurrent/ConcurrentHashMap;)Lorg/apac
he/hadoop/hive/metastore/IMetaStoreClient;
at org.apache.hadoop.hive.ql.metadata.HiveUtils.createMetaStoreClient(HiveUtils.java:481)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4479)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4459)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4715)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:295)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:278)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:452)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:379)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:359)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:335)
at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:257)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:384)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:249)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:105)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:249)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:135)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:125)
at org.apache.spark.sql.internal.SharedState.isDatabaseExistent$1(SharedState.scala:169)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:201)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:153)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:99)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:99)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupGlobalTempView(SessionCatalog.scala:870)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$.lookupTempView(Analyzer.scala:915)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$$anonfun$apply$7.applyOrElse(Analyzer.scala:875)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$$anonfun$apply$7.applyOrElse(Analyzer.scala:873)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
...
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
Hi, I'm trying to get the client running and I must admit it's been an uphill journey till now. Finally got the client to compile, planted the Jars in the spark/jars dir and updated the hive-site.xml file.
Whenever I try to read a table I get this error message:
I'm trying to run spark using a kubernetes spark-operator setup on a docker image I've built. BTW the only way I could get the client to compile was to use #16, compile both versions of hive first and only then compile the two clients together from the root of the repo.