clemlabprojects / ambari

Fork of Apache Ambari maintained by Clemlab Company
https://www.clemlab.com
Apache License 2.0
39 stars 15 forks source link

Spark2 Client install failed #62

Closed SGITLOGIN closed 4 months ago

SGITLOGIN commented 5 months ago

installed version odp:1.2.2.0-50 odp-utils:1.2.2.0 ambari:2.7.9.0.0-16

The first question: The /etc/spark2/conf directory was not generated during the installation process of Spark2 Client !!!

image image
SGITLOGIN commented 5 months ago

The second question:

ranger-kms install failed,because ODP should have copied mysql-connector-java.jar to the/usr/odp/current/ranger kms/ews/webapp/lib/directory, but in reality, mysql-connector-java.jar was copied to/usr/odp/current/ranger kms/ews/webapp/lib, and lib became a file instead of a directory

image image
SGITLOGIN commented 5 months ago

@lucasbak Could you please take a look at the problem? Thank you.

lucasbak commented 5 months ago

Hi @SGITLOGIN,

Thanks for the report, we also identified ranger-tagsync problem with postgresql connector. Indeed the tree directory for connector should not contain webapp. Will reproduce internally.

Best regards

SGITLOGIN commented 5 months ago

@lucasbak Solutions to two problems

  1. The solution to the first problem: mkdir -p /etc/spark2/conf

  2. The solution to the second problem: mv /usr/odp/current/ranger-kms/ews/webapp/lib /usr/odp/current/ranger-kms/ews/webapp/lib_bak mkdir /usr/odp/current/ranger-kms/ews/webapp/lib

Will you iterate these two issues into subsequent versions?

lucasbak commented 5 months ago

@SGITLOGIN ,

Yes, we will add also the fix for ranger-tagsync.

lucasbak commented 5 months ago

@SGITLOGIN

on which Operating system do you install the packages ?

SGITLOGIN commented 5 months ago

@lucasbak CentOS 7.9

SGITLOGIN commented 5 months ago
image
lucasbak commented 5 months ago

@SGITLOGIN

We will try to reproduce on centos 7.9

SGITLOGIN commented 5 months ago

Okay, thank you very much

SGITLOGIN commented 5 months ago

@lucasbak Could you please also install the Ranger KMS component when installing the cluster? My installation of the Ranger KMS component has also failed

SGITLOGIN commented 5 months ago

@lucasbak Here are all the components I installed. When you install the cluster, all of the following components are installed.

Choose File System

HDFS

Choose Services

YARN + MapReduce2 Hive Tez Atlas Kafka Hbase Ranger Infra Solr Ranger KMS ZooKeeper Ambari Metrics Spark2 Zeppelin Notebook Flink

SGITLOGIN commented 5 months ago

@lucasbak Phoenix Query Server startup failed!!! Report no file error: /usr/odp/current/phoenix-server/bin/queryserver.py

image image
lucasbak commented 5 months ago

@lucasbak Could you please also install the Ranger KMS component when installing the cluster? My installation of the Ranger KMS component has also failed

Yes will do

SGITLOGIN commented 5 months ago

@lucasbak Atlas startup failed,Work hard to analyze the problem.

image
lucasbak commented 5 months ago

@SGITLOGIN Which version of Ambari do you use ?

SGITLOGIN commented 5 months ago

@lucasbak odp:1.2.2.0-50 odp-utils:1.2.2.0 ambari:2.7.9.0.0-16

lucasbak commented 5 months ago

@SGITLOGIN

Alright, will reproduce your installation and keep you up to date

Best regards

SGITLOGIN commented 5 months ago

@lucasbak There is another question. The installed odp version is 1.2.2.0-50, but there is also a 1.2.2.0-53 directory under the /usr/odp/ directory. Is this normal?

image
lucasbak commented 5 months ago

@SGITLOGIN.

This is not normal. Did you not mixed up your repository files ? check in /etc/yum.repos.d/ files

SGITLOGIN commented 5 months ago

@lucasbak no,I suspect it was added accidentally when making the installation package.

SGITLOGIN commented 5 months ago

@lucasbak I installed it using Local repository.

image
lucasbak commented 5 months ago

We are currently deploying a new cluster We will reproduce your install from the WebUI. no worries ;-)

SGITLOGIN commented 5 months ago

OK

SGITLOGIN commented 5 months ago

@lucasbak The Spark2 Thrift Server service started successfully, but after a while, it showed that the service was in a failed state.

image image
lucasbak commented 5 months ago

Do you require spark2 instead of Spark3 ?

SGITLOGIN commented 5 months ago

yes,The spark version currently used by our spark program code is 2.*

SGITLOGIN commented 5 months ago

@lucasbak

  1. I tested Spark3 and there was no problem.
  2. There is a problem with Spark2 thrift server service
  3. I installed tez and hive

@lucasbak Here are all the components I installed. When you install the cluster, all of the following components are installed.

Choose File System

HDFS

Choose Services

YARN + MapReduce2 Hive Tez Atlas Kafka Hbase Ranger Infra Solr Ranger KMS ZooKeeper Ambari Metrics Spark2 Zeppelin Notebook Flink

lucasbak commented 5 months ago

@SGITLOGIN

OK. Do you use mysql/mariadb for all services backend ?

SGITLOGIN commented 5 months ago

@lucasbak yes,mysql

lucasbak commented 5 months ago

Ok. Currently reproducing every error internally and fix it. It may take time as we may need to rebuild RPMs

SGITLOGIN commented 5 months ago

Ok

lucasbak commented 5 months ago

@SGITLOGIN

the new version for both ambari and odp stack should be ready next days

Thanks for your support :)

SGITLOGIN commented 4 months ago

@lucasbak Ok,I have one more request.

  1. After solving the above problems, please try to turn on kerberos in the cluster to see if there is any problem, because our company needs to turn on kerberos authentication, so thank you.

@SGITLOGIN

  • We have successfully reproduced and found solution for ranger-tagsync not installing/starting
  • We have successfully reproduced and found solution for ranger-kms starting
  • We have successfully reproduced and found solution for spark2-client install
  • We have successfully reproduced and found solution for atlas metadata server not starting
  • We have successfully reproduced and found solution for Phoenix Queryserver not starting

the new version for both ambari and odp stack should be ready next days

Thanks for your support :)

SGITLOGIN commented 4 months ago

@lucasbak

SGITLOGIN commented 4 months ago

@lucasbak The yarn application log cannot be viewed and the error is as follows: "Logs are unavailable because Application Timeline Service seems unhealthy and could not connect to the JobHistory server.",Please analyze this issue too.

image image
SGITLOGIN commented 4 months ago

@lucasbak Hi,spark3-shell execution error:Error: Missing application resource. You said "It's a identified bug fixed in later version of ODP 1.2. Spark-shell is not rightly rendered",Has this issue not been fixed yet?

image
SGITLOGIN commented 4 months ago

@lucasbak When the odp version is 1.2.2.0-50, the hadoop-aliyun version is hadoop-aliyun-3.3.6.1.2.2.0-50.jar When the odp version is 1.2.1.0-134, the hadoop-aliyun version is hadoop-aliyun-3.3.4.1.2.1.0-134.jar

Questions are as follows: When the odp version is 1.2.1.0-134, put hadoop-aliyun-3.3.4.1.2.1.0-134.jar in the /usr/odp/current/spark3-client/jars directory. There will be no problem when spark accesses oss. . When the odp version is 1.2.2.0-50, hadoop-aliyun-3.3.6.1.2.2.0-50.jar is placed in the /usr/odp/current/spark3-client/jars directory, and spark will report an error when accessing oss.

image
lucasbak commented 4 months ago

@SGITLOGIN ,

All issues taken in count and be shipped in the next build:

For Yarn logs we need to reproduce.

However about the hadoop-aliyun, as it is specific to your cluster, we need to discuss it first internally. They are normally reserved for Support Customers.

Will keep you up to date when the build will be released :).

Best regards

SGITLOGIN commented 4 months ago

@lucasbak Ok, But regarding the hadoop-aliyun package, I think you should also consider the situation where spark accesses hive tables (the underlying data is stored on oss), so the hadoop-aliyun package you provide should also be equipped with the spark version.

SGITLOGIN commented 4 months ago

@lucasbak Will the next version of ODP fix all the problems I mentioned in this issue? When will the next version of ODP be released?

lucasbak commented 4 months ago

@SGITLOGIN

Alright. the next version of ODP 1.2.2.0 with all issues will be available before end of the week. Will keep you up to date

lucasbak commented 4 months ago

@SGITLOGIN ,

For Logs, you can use Yarn ui V1 it will work.

SGITLOGIN commented 4 months ago

@lucasbak Sorry,I didn't understand what "Yarn ui V1" is,Can you give an example or take a screenshot?

@SGITLOGIN ,

For Logs, you can use Yarn ui V1 it will work.

lucasbak commented 4 months ago
Screenshot 2024-02-07 at 10 15 53
SGITLOGIN commented 4 months ago

@lucasbak OK, thank you

Screenshot 2024-02-07 at 10 15 53
SGITLOGIN commented 4 months ago

@lucasbak There are two more questions about HIVE that need to be reproduced here.

  1. The HIVE log level setting is INFO, but when beeline goes to the command line, the log is displayed as DEBUG (new cluster).
  2. HIVE is extremely slow when executing msck repair table or drop table (EXTERNAL Table, only 31 partitions).
image image
lucasbak commented 4 months ago

@SGITLOGIN ,

We have identified the issue for DEBUG level. Can you check if the Hiveserver2 and hivemetastore logs are also in DEBUG ?

SGITLOGIN commented 4 months ago

@lucasbak Hiveserver2 and Hivemetastore log levels are also DEBUG

image image
lucasbak commented 4 months ago

@SGITLOGIN ,

as a workaround, try to create the /etc/hive/conf/logback.xml file with the content at https://github.com/clemlabprojects/ambari/blob/5598b04ff598d115af16c987a1ed3978cd46b7a8/ambari-server/src/main/resources/stacks/ODP/1.0/services/HIVE/package/templates/zookeeper-logback.xml.j2

replace zookeeper_log_level by INFO

restart hive service