TOSIT-IO / tdp-collection

Ansible collection to deploy the components of TDP
Apache License 2.0
21 stars 19 forks source link

Spark 2: Connection to Hive Metastore doesn't work when Hive Server 2 is running on the edge #394

Closed Nuttymoon closed 2 years ago

Nuttymoon commented 2 years ago

The connection to the Hive Metastore from Spark does not work. We can see this by doing a simple SHOW DATABASES in pyspark shell:

spark.sql('SHOW DATABASES')

With Spark 2, we have the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o37.sql.
...
Caused by: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:586)
    at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:180)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:114)
    ... 41 more
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.TezConfiguration
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:221)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 45 more

Based on the warnings printed at instantiation of the pyspark shell for Spark 2, it seems that the Hive properties in hive-site.xml are not compatible with Hive 1.x (which is used by Spark 2):

2022-08-02 11:44:14,100 WARN conf.HiveConf: HiveConf of name hive.metastore.uri.selection does not exist
2022-08-02 11:44:14,101 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.path does not exist
2022-08-02 11:44:14,101 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.password does not exist
2022-08-02 11:44:14,101 WARN conf.HiveConf: HiveConf of name hive.metastore.use.SSL does not exist
2022-08-02 11:45:27,448 WARN conf.HiveConf: HiveConf of name hive.metastore.hmshandler.retry.attempts does not exist
2022-08-02 11:45:27,449 WARN conf.HiveConf: HiveConf of name hive.metastore.authentication does not exist

I think the error is coming both from incompatible hive-site.xml properties and missing Tez jars.

mehdibn commented 2 years ago

Same Issue .. pyspark command is not well configured via a template. Please check the spark-shell template for example that is well configured in order to use the specific hive-site.xml that we generated for spark

Nuttymoon commented 2 years ago

Same Issue .. pyspark command is not well configured via a template. Please check the spark-shell template for example that is well configured in order to use the specific hive-site.xml that we generated for spark

pyspark uses the same template as spark-shell and the issue is also present with spark-shell.

mehdibn commented 2 years ago

same thing with spark2:

usera@mehdi-edge-01 ~]$ spark-shell 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2022-08-02 15:19:08,158 WARN conf.HiveConf: HiveConf of name hive.metastore.uri.selection does not exist
2022-08-02 15:19:08,159 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.path does not exist
2022-08-02 15:19:08,159 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.password does not exist
2022-08-02 15:19:08,159 WARN conf.HiveConf: HiveConf of name hive.metastore.use.SSL does not exist
2022-08-02 15:19:08,186 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://mehdi-edge-01.novalocal:4040
Spark context available as 'sc' (master = yarn, app id = application_1659344321529_0025).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.5-TDP-0.1.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_332)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sql("show databases").show()
Hive Session ID = 2ffa958a-89e5-4413-8dd9-1317cc13f160
+------------+
|databaseName|
+------------+
|     default|
+------------+

scala> 
mehdibn commented 2 years ago

this message : Caused by: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration means that you are using the hive-site.xml of hive components and not the spark one

mehdibn commented 2 years ago

can you please share your ls /etc/spark/conf content please ? and cat /etc/spark/conf/hive-site.xml?

Nuttymoon commented 2 years ago

Yes I don't know why in my setup Spark reads a different hive-site.xml than the one in /etc/spark/conf. I will try to figure it out.

mehdibn commented 2 years ago

Good Luck @Nuttymoon :) don't hesitate .. but we should open an issue about pyspark and commands that are not configured via a template ..

Nuttymoon commented 2 years ago

Good Luck @Nuttymoon :) don't hesitate .. but we should open an issue about pyspark and commands that are not configured via a template ..

Thanks! Pyspark is not the problem, it has the same template:

cat /usr/bin/pyspark
#!/usr/bin/env bash

export SPARK_CONF_DIR=/etc/spark/conf

/opt/tdp/spark/bin/pyspark "$@"
mehdibn commented 2 years ago

yes indeed :+1:

Nuttymoon commented 2 years ago

Ok I have pinpointed the issue. It happens when a Hive Server 2 is present on the edge. Changing the issue name.

nschung commented 2 years ago

@Nuttymoon Do you have any idea why the issue is happened only when HS2 is on edge? The HS2 on the edge is a normal position.

mehdibn commented 2 years ago

Indeed, when i try to execute this command from hive-s2 node, i had the same issue. A workaround, for debugging and not to apply in production, consists on moving /etc/hive to another directory, and it avoids this issue.

Nuttymoon commented 2 years ago

Unfortunately, this does not fix the issue when a Hive Server 2 is installed on the edge.

Caused by: java.lang.IllegalArgumentException: java.io.IOException: Configuration problem with provider path.
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:459)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:224)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:94)
    ... 80 more
Caused by: java.io.IOException: Configuration problem with provider path.
    at org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2363)
    at org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2282)
    at org.apache.hadoop.hive.metastore.conf.MetastoreConf.getPassword(MetastoreConf.java:1537)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:452)
    ... 82 more
Caused by: java.io.FileNotFoundException: /etc/hive/conf.s2/hive.jceks (Permission denied)
Nuttymoon commented 2 years ago

I have found a solution anyways. Opening PR in a bit.

mehdibn commented 2 years ago

From a master node with hive_s2 and hive_metasore

[root@mehdi-master-02 cloudadm]# ls /etc/hive/
conf.ms  conf.ms.1659018142  conf.ms.1659344346  conf.s2  conf.s2.1659018130  conf.s2.1659344338
[root@mehdi-master-02 cloudadm]# systemctl status hive*
● hive-metastore.service - HiveMetastore
   Loaded: loaded (/usr/lib/systemd/system/hive-metastore.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2022-08-01 10:59:16 CEST; 1 day 23h ago
 Main PID: 3780 (java)
   CGroup: /system.slice/hive-metastore.service
           └─3780 /usr/lib/jvm/jre-1.8.0-openjdk/bin/java -Dproc_jar -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom...

Aug 01 10:59:16 mehdi-master-02.novalocal systemd[1]: Started HiveMetastore.

● hive-server2.service - Hiveserver2
   Loaded: loaded (/usr/lib/systemd/system/hive-server2.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2022-08-01 10:59:30 CEST; 1 day 23h ago
 Main PID: 4518 (java)
   CGroup: /system.slice/hive-server2.service
           └─4518 /usr/lib/jvm/jre-1.8.0-openjdk/bin/java -Dproc_jar -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=true -Dcom...

Aug 01 10:59:30 mehdi-master-02.novalocal systemd[1]: Started Hiveserver2.

with hive.execution.engine :

[root@mehdi-master-02 cloudadm]# cat /etc/spark/conf/hive-site.xml 
<configuration>
    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://mehdi-master-01.novalocal:9083,thrift://mehdi-master-02.novalocal:9083</value>
  </property>
    <property>
    <name>hive.metastore.uri.selection</name>
    <value>RANDOM</value>
  </property>
    <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/spark</value>
  </property>
    <property>
    <name>hive.metastore.client.connect.retry.delay</name>
    <value>5</value>
  </property>
    <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>1800</value>
  </property>
    <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
    <property>
    <name>hive.server2.thrift.port</name>
    <value>10016</value>
  </property>
    <property>
    <name>hive.server2.transport.mode</name>
    <value>http</value>
  </property>
    <property>
    <name>hive.metastore.sasl.enabled</name>
    <value>true</value>
  </property>
    <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
    <property>
    <name>hive.metastore.use.SSL</name>
    <value>true</value>
  </property>
    <property>
    <name>hive.metastore.truststore.path</name>
    <value>/etc/ssl/certs/truststore.jks</value>
  </property>
    <property>
    <name>hive.metastore.truststore.password</name>
    <value>Truststore123!</value>
  </property>
    <property>
    <name>hive.metastore.kerberos.principal</name>
    <value>hive_ms/_HOST@TDP.LOCAL</value>
  </property>
    <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>
    <property>
    <name>hadoop.rpc.protection</name>
    <value>AUTHENTICATION</value>
  </property>
    <property>
    <name>hive.execution.engine</name>
    <value>spark</value>
  </property>
  </configuration>
[root@mehdi-master-02 cloudadm]# spark-shell 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2022-08-03 10:56:48,945 WARN conf.HiveConf: HiveConf of name hive.metastore.uri.selection does not exist
2022-08-03 10:56:48,945 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.path does not exist
2022-08-03 10:56:48,945 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.password does not exist
2022-08-03 10:56:48,945 WARN conf.HiveConf: HiveConf of name hive.metastore.use.SSL does not exist
2022-08-03 10:56:48,978 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://mehdi-master-02.novalocal:4040
Spark context available as 'sc' (master = yarn, app id = application_1659344321529_0060).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.5-TDP-0.1.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_332)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sql("show databases").show()
2022-08-03 10:57:16,992 WARN conf.HiveConf: HiveConf of name hive.metastore.hmshandler.retry.attempts does not exist
2022-08-03 10:57:16,992 WARN conf.HiveConf: HiveConf of name hive.metastore.authentication does not exist
Hive Session ID = bd528c92-46a5-49e5-baa5-75704cfb6336
+------------+
|databaseName|
+------------+
|     default|
+------------+

Without hive.execution.engine :

[root@mehdi-master-02 cloudadm]# vi /etc/spark/conf/hive-site.xml 
[root@mehdi-master-02 cloudadm]# cat /etc/spark/conf/hive-site.xml 
<configuration>
    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://mehdi-master-01.novalocal:9083,thrift://mehdi-master-02.novalocal:9083</value>
  </property>
    <property>
    <name>hive.metastore.uri.selection</name>
    <value>RANDOM</value>
  </property>
    <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/spark</value>
  </property>
    <property>
    <name>hive.metastore.client.connect.retry.delay</name>
    <value>5</value>
  </property>
    <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>1800</value>
  </property>
    <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
    <property>
    <name>hive.server2.thrift.port</name>
    <value>10016</value>
  </property>
    <property>
    <name>hive.server2.transport.mode</name>
    <value>http</value>
  </property>
    <property>
    <name>hive.metastore.sasl.enabled</name>
    <value>true</value>
  </property>
    <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
    <property>
    <name>hive.metastore.use.SSL</name>
    <value>true</value>
  </property>
    <property>
    <name>hive.metastore.truststore.path</name>
    <value>/etc/ssl/certs/truststore.jks</value>
  </property>
    <property>
    <name>hive.metastore.truststore.password</name>
    <value>Truststore123!</value>
  </property>
    <property>
    <name>hive.metastore.kerberos.principal</name>
    <value>hive_ms/_HOST@TDP.LOCAL</value>
  </property>
    <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>
    <property>
    <name>hadoop.rpc.protection</name>
    <value>AUTHENTICATION</value>
  </property>
  </configuration>
[root@mehdi-master-02 cloudadm]# 
[root@mehdi-master-02 cloudadm]# spark-shell 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2022-08-03 10:59:37,428 WARN conf.HiveConf: HiveConf of name hive.metastore.uri.selection does not exist
2022-08-03 10:59:37,429 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.path does not exist
2022-08-03 10:59:37,429 WARN conf.HiveConf: HiveConf of name hive.metastore.truststore.password does not exist
2022-08-03 10:59:37,429 WARN conf.HiveConf: HiveConf of name hive.metastore.use.SSL does not exist
2022-08-03 10:59:37,454 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://mehdi-master-02.novalocal:4040
Spark context available as 'sc' (master = yarn, app id = application_1659344321529_0062).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.5-TDP-0.1.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_332)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

scala> spark.sql("show databases").show()
2022-08-03 10:59:58,417 WARN conf.HiveConf: HiveConf of name hive.metastore.hmshandler.retry.attempts does not exist
2022-08-03 10:59:58,417 WARN conf.HiveConf: HiveConf of name hive.metastore.authentication does not exist
Hive Session ID = 9465bc59-6c4f-4fbc-8202-9b92b7e355d0
java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration when creating Hive client using classpath: file:/opt/tdp/hive/lib/hive-common-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-classification-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-upgrade-acid-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-shims-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-shims-common-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/log4j-slf4j-impl-2.10.0.jar, file:/opt/tdp/hive/lib/log4j-api-2.10.0.jar, file:/opt/tdp/hive/lib/log4j-core-2.10.0.jar, file:/opt/tdp/hive/lib/guava-19.0.jar, file:/opt/tdp/hive/lib/commons-lang-2.6.jar, file:/opt/tdp/hive/lib/libthrift-0.9.3.jar, file:/opt/tdp/hive/lib/httpclient-4.5.2.jar, file:/opt/tdp/hive/lib/httpcore-4.4.4.jar, file:/opt/tdp/hive/lib/commons-logging-1.0.4.jar, file:/opt/tdp/hive/lib/commons-codec-1.7.jar, file:/opt/tdp/hive/lib/curator-framework-2.12.0.jar, file:/opt/tdp/hive/lib/curator-client-2.12.0.jar, file:/opt/tdp/hive/lib/zookeeper-3.4.6.jar, file:/opt/tdp/hive/lib/jline-2.12.jar, file:/opt/tdp/hive/lib/netty-3.10.5.Final.jar, file:/opt/tdp/hive/lib/hive-shims-0.23-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/javax.servlet-api-3.1.0.jar, file:/opt/tdp/hive/lib/protobuf-java-2.5.0.jar, file:/opt/tdp/hive/lib/commons-io-2.4.jar, file:/opt/tdp/hive/lib/jettison-1.1.jar, file:/opt/tdp/hive/lib/jaxb-api-2.2.11.jar, file:/opt/tdp/hive/lib/jackson-core-asl-1.9.13.jar, file:/opt/tdp/hive/lib/jackson-mapper-asl-1.9.13.jar, file:/opt/tdp/hive/lib/jackson-annotations-2.10.0.jar, file:/opt/tdp/hive/lib/asm-5.0.1.jar, file:/opt/tdp/hive/lib/commons-compress-1.9.jar, file:/opt/tdp/hive/lib/jetty-util-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/commons-cli-1.2.jar, file:/opt/tdp/hive/lib/jackson-core-2.10.0.jar, file:/opt/tdp/hive/lib/jackson-databind-2.10.0.jar, file:/opt/tdp/hive/lib/commons-math3-3.6.1.jar, file:/opt/tdp/hive/lib/jetty-server-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-http-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-io-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-servlet-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-security-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-webapp-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-xml-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/commons-lang3-3.2.jar, file:/opt/tdp/hive/lib/avro-1.8.2.jar, file:/opt/tdp/hive/lib/paranamer-2.7.jar, file:/opt/tdp/hive/lib/snappy-java-1.1.4.jar, file:/opt/tdp/hive/lib/xz-1.5.jar, file:/opt/tdp/hive/lib/gson-2.2.4.jar, file:/opt/tdp/hive/lib/curator-recipes-2.12.0.jar, file:/opt/tdp/hive/lib/jsr305-3.0.0.jar, file:/opt/tdp/hive/lib/hive-shims-scheduler-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-storage-api-2.7.0.jar, file:/opt/tdp/hive/lib/orc-core-1.5.8.jar, file:/opt/tdp/hive/lib/orc-shims-1.5.8.jar, file:/opt/tdp/hive/lib/aircompressor-0.10.jar, file:/opt/tdp/hive/lib/jetty-rewrite-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-client-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/joda-time-2.9.9.jar, file:/opt/tdp/hive/lib/log4j-1.2-api-2.10.0.jar, file:/opt/tdp/hive/lib/log4j-web-2.10.0.jar, file:/opt/tdp/hive/lib/ant-1.9.1.jar, file:/opt/tdp/hive/lib/ant-launcher-1.9.1.jar, file:/opt/tdp/hive/lib/jpam-1.1.jar, file:/opt/tdp/hive/lib/json-1.8.jar, file:/opt/tdp/hive/lib/metrics-core-3.1.0.jar, file:/opt/tdp/hive/lib/metrics-jvm-3.1.0.jar, file:/opt/tdp/hive/lib/metrics-json-3.1.0.jar, file:/opt/tdp/hive/lib/dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar, file:/opt/tdp/hive/lib/javolution-5.5.1.jar, file:/opt/tdp/hive/lib/hive-serde-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-service-rpc-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/libfb303-0.9.3.jar, file:/opt/tdp/hive/lib/arrow-vector-0.8.0.jar, file:/opt/tdp/hive/lib/arrow-format-0.8.0.jar, file:/opt/tdp/hive/lib/flatbuffers-1.2.0-3f79e055.jar, file:/opt/tdp/hive/lib/arrow-memory-0.8.0.jar, file:/opt/tdp/hive/lib/netty-buffer-4.1.17.Final.jar, file:/opt/tdp/hive/lib/netty-common-4.1.17.Final.jar, file:/opt/tdp/hive/lib/hppc-0.7.2.jar, file:/opt/tdp/hive/lib/opencsv-2.3.jar, file:/opt/tdp/hive/lib/parquet-hadoop-bundle-1.10.0.jar, file:/opt/tdp/hive/lib/hive-metastore-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-standalone-metastore-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/bonecp-0.8.0.RELEASE.jar, file:/opt/tdp/hive/lib/HikariCP-2.6.1.jar, file:/opt/tdp/hive/lib/commons-dbcp-1.4.jar, file:/opt/tdp/hive/lib/commons-pool-1.5.4.jar, file:/opt/tdp/hive/lib/antlr-runtime-3.5.2.jar, file:/opt/tdp/hive/lib/derby-10.14.1.0.jar, file:/opt/tdp/hive/lib/datanucleus-api-jdo-4.2.4.jar, file:/opt/tdp/hive/lib/datanucleus-core-4.1.17.jar, file:/opt/tdp/hive/lib/datanucleus-rdbms-4.1.19.jar, file:/opt/tdp/hive/lib/javax.jdo-3.2.0-m3.jar, file:/opt/tdp/hive/lib/transaction-api-1.1.jar, file:/opt/tdp/hive/lib/sqlline-1.3.0.jar, file:/opt/tdp/hive/lib/hbase-client-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-shaded-protobuf-1.0.1.jar, file:/opt/tdp/hive/lib/hbase-common-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-shaded-miscellaneous-1.0.1.jar, file:/opt/tdp/hive/lib/commons-collections4-4.1.jar, file:/opt/tdp/hive/lib/htrace-core-3.2.0-incubating.jar, file:/opt/tdp/hive/lib/commons-crypto-1.0.0.jar, file:/opt/tdp/hive/lib/findbugs-annotations-1.3.9-1.jar, file:/opt/tdp/hive/lib/audience-annotations-0.5.0.jar, file:/opt/tdp/hive/lib/hbase-hadoop-compat-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-metrics-api-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-hadoop2-compat-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-metrics-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-protocol-shaded-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-protocol-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-shaded-netty-1.0.1.jar, file:/opt/tdp/hive/lib/jcodings-1.0.18.jar, file:/opt/tdp/hive/lib/joni-2.1.11.jar, file:/opt/tdp/hive/lib/jdo-api-3.0.1.jar, file:/opt/tdp/hive/lib/jta-1.1.jar, file:/opt/tdp/hive/lib/hive-testutils-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/tempus-fugit-1.1.jar, file:/opt/tdp/hive/lib/hive-exec-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-vector-code-gen-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/velocity-1.5.jar, file:/opt/tdp/hive/lib/hive-llap-tez-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-llap-client-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-llap-common-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/ST4-4.0.4.jar, file:/opt/tdp/hive/lib/ivy-2.4.0.jar, file:/opt/tdp/hive/lib/groovy-all-2.4.11.jar, file:/opt/tdp/hive/lib/calcite-core-1.16.0.jar, file:/opt/tdp/hive/lib/calcite-linq4j-1.16.0.jar, file:/opt/tdp/hive/lib/esri-geometry-api-2.0.0.jar, file:/opt/tdp/hive/lib/sketches-core-0.9.0.jar, file:/opt/tdp/hive/lib/memory-0.9.0.jar, file:/opt/tdp/hive/lib/janino-2.7.6.jar, file:/opt/tdp/hive/lib/commons-compiler-2.7.6.jar, file:/opt/tdp/hive/lib/calcite-druid-1.16.0.jar, file:/opt/tdp/hive/lib/avatica-1.11.0.jar, file:/opt/tdp/hive/lib/stax-api-1.0.1.jar, file:/opt/tdp/hive/lib/hive-service-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-llap-server-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/netty-all-4.1.17.Final.jar, file:/opt/tdp/hive/lib/hive-llap-common-3.1.3-TDP-0.1.0-SNAPSHOT-tests.jar, file:/opt/tdp/hive/lib/hbase-server-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-http-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/jersey-server-2.25.1.jar, file:/opt/tdp/hive/lib/jersey-common-2.25.1.jar, file:/opt/tdp/hive/lib/javax.ws.rs-api-2.0.1.jar, file:/opt/tdp/hive/lib/javax.annotation-api-1.2.jar, file:/opt/tdp/hive/lib/jersey-guava-2.25.1.jar, file:/opt/tdp/hive/lib/hk2-api-2.5.0-b32.jar, file:/opt/tdp/hive/lib/hk2-utils-2.5.0-b32.jar, file:/opt/tdp/hive/lib/aopalliance-repackaged-2.5.0-b32.jar, file:/opt/tdp/hive/lib/javax.inject-2.5.0-b32.jar, file:/opt/tdp/hive/lib/hk2-locator-2.5.0-b32.jar, file:/opt/tdp/hive/lib/javassist-3.20.0-GA.jar, file:/opt/tdp/hive/lib/osgi-resource-locator-1.0.1.jar, file:/opt/tdp/hive/lib/jersey-client-2.25.1.jar, file:/opt/tdp/hive/lib/jersey-media-jaxb-2.25.1.jar, file:/opt/tdp/hive/lib/validation-api-1.1.0.Final.jar, file:/opt/tdp/hive/lib/jersey-container-servlet-core-2.25.1.jar, file:/opt/tdp/hive/lib/hbase-procedure-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-common-2.0.0-alpha4-tests.jar, file:/opt/tdp/hive/lib/hbase-replication-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/hbase-prefix-tree-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/javax.servlet.jsp-2.3.2.jar, file:/opt/tdp/hive/lib/javax.servlet.jsp-api-2.3.1.jar, file:/opt/tdp/hive/lib/jamon-runtime-2.3.1.jar, file:/opt/tdp/hive/lib/disruptor-3.3.6.jar, file:/opt/tdp/hive/lib/hbase-mapreduce-2.0.0-alpha4.jar, file:/opt/tdp/hive/lib/jetty-runner-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-plus-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-jndi-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-annotations-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/asm-commons-5.0.1.jar, file:/opt/tdp/hive/lib/asm-tree-5.0.1.jar, file:/opt/tdp/hive/lib/jetty-jaas-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/websocket-server-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/websocket-common-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/websocket-api-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/websocket-client-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/websocket-servlet-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/apache-jsp-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/jetty-schemas-3.1.jar, file:/opt/tdp/hive/lib/ecj-4.4.2.jar, file:/opt/tdp/hive/lib/apache-jstl-9.3.20.v20170531.jar, file:/opt/tdp/hive/lib/taglibs-standard-spec-1.2.5.jar, file:/opt/tdp/hive/lib/taglibs-standard-impl-1.2.5.jar, file:/opt/tdp/hive/lib/hive-jdbc-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-beeline-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/super-csv-2.2.0.jar, file:/opt/tdp/hive/lib/hive-cli-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-contrib-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-hbase-handler-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hbase-hadoop2-compat-2.0.0-alpha4-tests.jar, file:/opt/tdp/hive/lib/hive-druid-handler-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/jackson-dataformat-smile-2.10.0.jar, file:/opt/tdp/hive/lib/druid-hdfs-storage-0.12.0.jar, file:/opt/tdp/hive/lib/mysql-metadata-storage-0.12.0.jar, file:/opt/tdp/hive/lib/postgresql-metadata-storage-0.12.0.jar, file:/opt/tdp/hive/lib/postgresql-9.4.1208.jre7.jar, file:/opt/tdp/hive/lib/hive-jdbc-handler-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-accumulo-handler-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/accumulo-core-1.7.3.jar, file:/opt/tdp/hive/lib/jcommander-1.32.jar, file:/opt/tdp/hive/lib/accumulo-fate-1.7.3.jar, file:/opt/tdp/hive/lib/accumulo-start-1.7.3.jar, file:/opt/tdp/hive/lib/commons-vfs2-2.1.jar, file:/opt/tdp/hive/lib/commons-math-2.1.jar, file:/opt/tdp/hive/lib/accumulo-trace-1.7.3.jar, file:/opt/tdp/hive/lib/hive-llap-ext-client-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-hplsql-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/antlr4-runtime-4.5.jar, file:/opt/tdp/hive/lib/org.abego.treelayout.core-1.0.1.jar, file:/opt/tdp/hive/lib/hive-streaming-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-kryo-registrator-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-hcatalog-core-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/hive-hcatalog-server-extensions-3.1.3-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/postgresql-jdbc.jar, file:/opt/tdp/hive/lib/ranger-hive-plugin-shim-2.0.1-TDP-0.1.0-SNAPSHOT.jar, file:/opt/tdp/hive/lib/ranger-plugin-classloader-2.0.1-TDP-0.1.0-SNAPSHOT.jar
Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars.
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:270)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:385)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:287)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
  at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
  at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
  at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
  at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
  at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
  at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  ... 49 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  ... 74 more
Caused by: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:586)
  at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:180)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:114)
  ... 79 more
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.TezConfiguration
  at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:221)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:210)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  ... 83 more

scala> 
mehdibn commented 2 years ago

@Nuttymoon as you can see in my logs the hive.execution.engine fixed the issue in a node with spark_client and all hive components. i don't understand why you still have it ..

Nuttymoon commented 2 years ago

@Nuttymoon as you can see in my logs the hive.execution.engine fixed the issue in a node with spark_client and all hive components. i don't understand why you still have it ..

I think it is working because you are running as root and therefore you have the right to read /etc/hive/conf.s2/hive.jceks. If you try running the command as un unprivileged user, you should end up with my error as well.

mehdibn commented 2 years ago

@Nuttymoon as you can see in my logs the hive.execution.engine fixed the issue in a node with spark_client and all hive components. i don't understand why you still have it ..

I think it is working because you are running as root and therefore you have the right to read /etc/hive/conf.s2/hive.jceks. If you try running the command as un unprivileged user, you should end up with my error as well.

I see it :)