Spark2 Client install failed

SGITLOGIN commented 5 months ago

installed version odp：1.2.2.0-50 odp-utils：1.2.2.0 ambari：2.7.9.0.0-16

The first question： The /etc/spark2/conf directory was not generated during the installation process of Spark2 Client ！！！

SGITLOGIN commented 5 months ago

@lucasbak An error is always printed in the hive.out log：java.lang.Exception: null You also need to reproduce the above problem.

lucasbak commented 5 months ago

@SGITLOGIN can you show me the content of /etc/hive/conf/logback.xml ? on our cluster we dont have any error

SGITLOGIN commented 5 months ago

@lucasbak I tried it and the log level has been changed to INFO. error should be debug level logs，After changing the log level to INFO，this error no longer appears on my side.

@SGITLOGIN ,

as a workaround, try to create the /etc/hive/conf/logback.xml file with the content at https://github.com/clemlabprojects/ambari/blob/5598b04ff598d115af16c987a1ed3978cd46b7a8/ambari-server/src/main/resources/stacks/ODP/1.0/services/HIVE/package/templates/zookeeper-logback.xml.j2

replace zookeeper_log_level by INFO

restart hive service

lucasbak commented 5 months ago

@SGITLOGIN ,

In the fix, we sync log level on the hive > LOG LEVEL param so it will be done automatically. It will be part of next ambari build to. We will release ambari + odp before the en of the week

SGITLOGIN commented 5 months ago

@lucasbak Ok， The problem just mentioned needs you to reproduce it here.
Question：""""HIVE is extremely slow when executing msck repair table or drop table (EXTERNAL TABLE，Data is stored on oss，only 31 partitions).""""

lucasbak commented 5 months ago

@SGITLOGIN

Try again generally give being slow is linked to Hive debug level. Hive is the same version as 1.2.1.0

SGITLOGIN commented 5 months ago

@lucasbak ODP 1.2.1.0 version, I have not tested the repair partition here.

@SGITLOGIN

Try again generally give being slow is linked to Hive debug level. Hive is the same version as 1.2.1.0

lucasbak commented 5 months ago

Try again without debug and keep us updated

SGITLOGIN commented 5 months ago

@lucasbak You can try the operation of repairing the table partition and see how long the execution time is. If there is no problem with the execution time on your side, it may be a problem with my installation of the cluster. Currently, I have reinstalled component many times so far , may be a problem caused by my misoperation, so I want you to try first to see if it is a problem with the ODP version. If it is not a problem with the ODP version, I will reinstall the cluster.

@SGITLOGIN

Try again generally give being slow is linked to Hive debug level. Hive is the same version as 1.2.1.0

lucasbak commented 5 months ago

How much partition do you have ?

SGITLOGIN commented 5 months ago

30 partitions

lucasbak commented 5 months ago

@SGITLOGIN ,

We have just tested with 36 partition. The INTO TABLE took less than 20 seconds And the msck repair took 0.5 sec.

Your long response time may du to DEBUG, try to put de logback.xml file in all configuration folders. That's the solution we will provide in next build of Ambari coming before the end of the week.

Best regards

SGITLOGIN commented 5 months ago

OK

SGITLOGIN commented 5 months ago

@lucasbak Hello，The solution to this problem is considered to be solved in the next version，This problem will affect the use of Spark components，thank you

@lucasbak When the odp version is 1.2.2.0-50, the hadoop-aliyun version is hadoop-aliyun-3.3.6.1.2.2.0-50.jar When the odp version is 1.2.1.0-134, the hadoop-aliyun version is hadoop-aliyun-3.3.4.1.2.1.0-134.jar

Questions are as follows: When the odp version is 1.2.1.0-134, put hadoop-aliyun-3.3.4.1.2.1.0-134.jar in the /usr/odp/current/spark3-client/jars directory. There will be no problem when spark accesses oss. . When the odp version is 1.2.2.0-50, hadoop-aliyun-3.3.6.1.2.2.0-50.jar is placed in the /usr/odp/current/spark3-client/jars directory, and spark will report an error when accessing oss.

SGITLOGIN commented 5 months ago

@lucasbak There is another problem. When installing the Zepplin component, the two parameters zeppelin.interpreter.exclude and zeppelin.interpreter.include must be configured to perform subsequent installation steps. However, after these two configuration items are set, the background log reports an error and says that they cannot be used at the same time. the service did not start normally and the page cannot be accessed.

SGITLOGIN commented 5 months ago

@lucasbak Hello Can Spark2 and Spark3 be installed at the same time? Can Hive's engine be set to spark? Kerberos can be enabled for a cluster only for a certain component？

lucasbak commented 5 months ago

Hi @SGITLOGIN ,

Spark2 and Spark3 can not be installed at the same time, however you can install manually the rpm at the same time if you want to use both, you will need to configure one of it manually on the host you want to run the application from.The package creates symlinks between binaries and bin folder so you need to be careful.
For Hive on Spark you can try it but we do not support it. You should use Tez as in Hive 3 it is better optimized.
Kerberos must be enabled globally and can not be disabled for some component, every service is installed and configured to work with Kerberos, what's the problem with Kerberos ?
For Zeppelin, we did not encoutered the problem, can you just let a blank space for the property ? the solution is to delete the property from configuration file

SGITLOGIN commented 5 months ago

@lucasbak

After kerberos authentication is turned on, authentication is required to access yarn's 8088 port, but we do not need kerberos authentication for this part, just access it directly.
For Zeppelin, you should also encounter this problem in this version. The solution you mentioned later is to log in to the server and modify the configuration file, right?

lucasbak commented 5 months ago

@SGITLOGIN

Try to disable only for http for yarn, or you can use knox which can proxy for you the web ui without kerberos.
I meant using Apache Ambari REST API

SGITLOGIN commented 5 months ago

OK，I will try it

lucasbak commented 5 months ago

Hi @SGITLOGIN

new versions are available :) it contains all the fixes in this issue, and disable DEBUG mode by default in zookeeper client for HDFS, YARN, HIVE.

ODP：1.2.2.0-58 ODP-UTILS：1.2.2.0 AMBARI：2.7.9.0.0-23

Keep us up to date

SGITLOGIN commented 5 months ago

OK

B0byM0uth commented 4 months ago

Hello,

I tried the lastest version with: ODP：1.2.2.0-58 ODP-UTILS：1.2.2.0 AMBARI：2.7.9.0.0-23

My POC server is running with Centos 7.9. I use public repositories.

I saw some bugs. First one bug is concerning Zookeeper, just after the install zookeeper can't start as the parameter admin.serverPort is not defined and by default it's 8080, the Ambari's port. After changed it, all is fine.

Second bug is concerning Spark 2, the install and start is ok, excepted for Thrift server.

Last 4096 bytes of prelaunch.err : /hadoop/yarn/local/usercache/spark/appcache/application_1707514685210_0003/container_e07_1707514685210_0003_01_000001/launch_container.sh: line 37: $PWD:$PWD/spark_conf:$PWD/spark_libs/:$HADOOP_CONF_DIR:/usr/odp/1.2.2.0-58/hadoop/:/usr/odp/1.2.2.0-58/hadoop/lib/:/usr/odp/current/hadoop-hdfs-client/:/usr/odp/current/hadoop-hdfs-client/lib/:/usr/odp/current/hadoop-yarn-client/:/usr/odp/current/hadoop-yarn-client/lib/:/usr/odp/current/hadoop-client/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/odp/${odp.version}/hadoop/lib/hadoop-lzo-0.6.0.${odp.version}.jar:/etc/hadoop/conf/secure:$PWD/spark_conf/hadoop_conf: bad substitution

I noticed that the variable "${odp.version}" isn't populated correctly. I looked for the variable in all configurations and changed it to "1.2.2.0-58". I saw some paths with "/usr/hdp" too, I don't know if it can be an issue ?

I restarted the Thrift server, but after a while, the Thrift server stops.

Exception in thread "main" java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException when creating Hive client using classpath: file:/usr/odp/current/spark2-client/standalone-metastore/hive-metastore-1.2.1.1.2.2.0-58.jar Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars.

Is Spark2 Thrift server service work ?

Another problem is with Spark3, I had this error:

Exception in thread "main" org.apache.spark.SparkException: Master must either be yarn or start with spark, mesos, k8s, or local

It appears that the property "spark.master" in "Advanced spark3-thrift-sparkconf" is not setted correctly, by default it's "{{spark_thrift_master}}", I changed it to "yarn" and I restarted the Thrift server.

New error:

24/02/10 08:35:07 ERROR FairSchedulableBuilder: Error while building the fair scheduler pools java.io.FileNotFoundException: File does not exist: /usr/odp/current/spark3-thriftserver/conf/spark-thrift-fairscheduler.xml

I checked the file and it is there!

Last bug, I tried to install NiFi, but from the start it can't find the package. Is NiFi available ? I see only this for NiFi:

nifi.noarch : nifi_1_2_2_0_58 Stack Link Virtual Package nifi-registry.noarch : nifi_1_2_2_0_58-registry Stack Link Virtual Package nifi-registry_1_2_2_0_58.noarch : Apache NiFi Registry is a complementary nifi-toolkit.noarch : nifi_1_2_2_0_58-toolkit Stack Link Virtual Package nifi_1_2_2_0_58-registry.noarch : Apache NiFi Registry is a complementary nifi_1_2_2_0_58-toolkit.noarch : Apache NiFi is an easy to use, powerful, and

Thank you.

lucasbak commented 4 months ago

@SGITLOGIN

Did you succeed ?

clemlabprojects / ambari

Spark2 Client install failed #62