Closed thirdparty-core closed 3 years ago
Hello, thank you for posting your issue! Could you do a few preliminary things for me while I get an environment set up myself to reproduce your error?
On the host machine where you run spark-submit
, could you run the following?
$ nslookup node01
$ nslookup node02
$ nslookup node03
If you don't have nslookup
, then ping
is fine too; I just want to see what the IP those hostnames resolve to.
If possible, could you share a screenshot (or the contents) of your [SparkUI Environment tab]
(https://spark.apache.org/docs/3.0.0-preview/web-ui.html#environment-tab) when you submit that job?. I'm looking primarily for the spark.driver.extraJavaOptions
and spark.executor.extraJavaOptions
contents to show up as you expect them.
Could you describe a little bit about the infrastructure you are deploying Spark-Hadoop-YARN in (e.g: number of namenodes & datanodes)? Are you able to share the hdfs-site.xml
or core-site.xml
?
Thanks so much for you help!
Let me say sorry first. I used a fake hostname. But the problem remains. ping hosts where I run spark-submit:
[henghe@henghe66 ~]$ ping henghe66
PING henghe66 (192.168.100.66) 56(84) bytes of data.
64 bytes from henghe66 (192.168.100.66): icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from henghe66 (192.168.100.66): icmp_seq=2 ttl=64 time=0.036 ms
^C
--- henghe66 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.026/0.031/0.036/0.005 ms
[henghe@henghe66 ~]$
[henghe@henghe66 ~]$ ping henghe67
PING henghe67 (192.168.100.67) 56(84) bytes of data.
64 bytes from henghe67 (192.168.100.67): icmp_seq=1 ttl=64 time=0.162 ms
64 bytes from henghe67 (192.168.100.67): icmp_seq=2 ttl=64 time=0.196 ms
^C
--- henghe67 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.162/0.179/0.196/0.017 ms
[henghe@henghe66 ~]$
[henghe@henghe66 ~]$ ping henghe68
PING henghe68 (192.168.100.68) 56(84) bytes of data.
64 bytes from henghe68 (192.168.100.68): icmp_seq=1 ttl=64 time=0.148 ms
64 bytes from henghe68 (192.168.100.68): icmp_seq=2 ttl=64 time=0.153 ms
64 bytes from henghe68 (192.168.100.68): icmp_seq=3 ttl=64 time=0.179 ms
^C
--- henghe68 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.148/0.160/0.179/0.013 ms
[henghe@henghe66 ~]$
spark-submit shell
spark-submit \
--principal henghe@HENGHE.COM \
--keytab /home/henghe/keytabs/henghe.keytab \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 1g \
--driver-cores 1 \
--executor-cores 1 \
--num-executors 3 \
--conf spark.yarn.maxAppAttempts=1 \
--conf spark.driver.extraClassPath=/opt/alluxio-2.6.1/client/alluxio-2.6.1-client.jar \
--conf spark.executor.extraClassPath=/opt/alluxio-2.6.1/client/alluxio-2.6.1-client.jar \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.zookeeper.address=henghe66:2181,henghe67:2181,henghe68:2181 -Dalluxio.zookeeper.enabled=true' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.zookeeper.address=henghe66:2181,henghe67:2181,henghe68:2181 -Dalluxio.zookeeper.enabled=true' \
--class com.yss.henghe.spark.SparkReadWriteFromAlluxio \
/home/henghe/spark-jobs/spark-2.4.6-1.0.0.jar
Spark UI env
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>henghe</value>
</property>
<property>
<name>dfs.ha.namenodes.henghe</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.henghe.nn1</name>
<value>henghe66:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.henghe.nn2</name>
<value>henghe67:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.henghe.nn1</name>
<value>henghe66:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.henghe.nn2</name>
<value>henghe67:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://henghe66:8485;henghe67:8485;henghe68:8485/henghe</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop-data/journal</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.henghe</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/henghe/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>4096</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop-data/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop-data/hdfs/namenode</value>
</property>
<!-- ======================= kerberos start ================= -->
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/opt/hadoop/keytabs/hadoop.keytab</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@HENGHE.COM</value>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/_HOST@HENGHE.COM</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/opt/hadoop/keytabs/hadoop.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@HENGHE.COM</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50470</value>
</property>
<property>
<name>dfs.journalnode.keytab.file</name>
<value>/opt/hadoop/keytabs/hadoop.keytab</value>
</property>
<property>
<name>dfs.journalnode.kerberos.principal</name>
<value>hdfs/_HOST@HENGHE.COM</value>
</property>
<property>
<name>dfs.journalnode.kerberos.internal.spnego.principal</name>
<value>HTTP/_HOST@HENGHE.COM</value>
</property>
<property>
<name>dfs.journalnode.https-address</name>
<value>0.0.0.0:8481</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@HENGHE.COM</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/opt/hadoop/keytabs/hadoop.keytab</value>
</property>
<property>
<name>hadoop.http.authentication.type</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.principal</name>
<value>HTTP/_HOST@HENGHE.COM</value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.keytab</name>
<value>/opt/hadoop/keytabs/hadoop.keytab</value>
</property>
<property>
<name>hadoop.http.filter.initializers</name>
<value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
</configuration>
core-site
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://henghe</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>henghe66:2181,henghe67:2181,henghe68:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop-data/tmp/hadoop-${user.name}</value>
</property>
<!-- ===================kerberos start ================== -->
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.users</name>
<value>jhs,hbase,henghe,yarn,hive</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.users</name>
<value>jhs,hbase,henghe,hdfs,hive</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.users</name>
<value>jhs,hbase,henghe,hdfs,yarn</value>
</property>
<property>
<name>hadoop.proxyuser.henghe.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.henghe.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.henghe.users</name>
<value>jhs,hbase,alluxio,hdfs,yarn,kafka,hive</value>
</property>
</configuration>
my alluxio mountTable:
mysql alluxio Browse
I made the following dependencies in my spark job project:
<dependency>
<groupId>org.alluxio</groupId>
<artifactId>alluxio-shaded-client</artifactId>
<version>2.6.1</version>
</dependency>
I can run my spark job successfully in idea. However, when I package it and run it on yarn, I set it according to the alluxio document, and the above problem occurs.
I have tried to use plugin to grab the dependent jars into a directory and put them into the HDFS directory. The pom is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>henghe-demos</artifactId>
<groupId>com.yss.henghe</groupId>
<version>1.0.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>spark-2.4.6</artifactId>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<spark.version>2.4.5</spark.version>
<scala.binary.version>2.12</scala.binary.version>
<hadoop.version>3.1.3</hadoop.version>
<scala.version>2.12.12</scala.version>
<compileSource>1.8</compileSource>
<target.jvm>-target:jvm-1.8</target.jvm>
<scope>compile</scope><!--compile--><!--provided-->
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>${scope}</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>${scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>${scope}</scope>
</dependency>
<dependency>
<groupId>org.alluxio</groupId>
<artifactId>alluxio-shaded-client</artifactId>
<version>2.6.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-feature</arg>
<!--The target.jvm variable gets set above by the groovy
snippet in the gmaven-plugin.-->
<arg>${target.jvm}</arg>
</args>
<source>${compileSource}</source>
<target>${compileSource}</target>
</configuration>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>process-resources</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>process-test-resources</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.1.2</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/libs</outputDirectory>
<includeTypes>
jar
</includeTypes>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
jars on hdfs:
then my spark-default.conf:
spark.yarn.historyServer.address=henghe66:18080
spark.yarn.historyServer.allowTracking=true
spark.eventLog.dir=hdfs://henghe/spark/eventlogs
spark.eventLog.enabled=true
spark.history.fs.logDirectory=hdfs://henghe/spark/hisLogs
spark.yarn.jars=hdfs://henghe/spark/alluxio-deps/*.jar
spark-submit shell
spark-submit \
--principal henghe@HENGHE.COM \
--keytab /home/henghe/keytabs/henghe.keytab \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 1g \
--driver-cores 1 \
--executor-cores 1 \
--num-executors 3 \
--conf spark.yarn.maxAppAttempts=1 \
--class com.yss.henghe.spark.SparkReadWriteFromAlluxio \
/home/henghe/spark-jobs/spark-2.4.6-1.0.0.jar
Then I run spark submit, which sometimes seems to recognizezk@henghe68 :2181; henghe66:2181; henghe68:2181
,
But new problems have arisen. as follows:
Sometimes when I use this method, I can't recognize it. ' zk@henghe68 :2181; henghe66:2181; Henghe68:2181 '.
That's how I tried to solve the problem, but it didn't succeed in the end.
What should I do in a situation like mine? Looking forward to your reply, thank you very much.
Thanks so much for the detailed information! I will post an update once I or somebody has investigated this issue further.
Hello, I found the reason. The details are as follows: Since alluxio.hadoop.FileSystem has no Override org.apache.hadoop.fs.FileSystem#addDelegationTokens(String renewer, Credentials credentials),And when my hadoop turned on kerberos authentication and spark was running on yarn, it would use the method of org.apache.hadoop.fs.FileSystem#addDelegationTokens to obtain the token, and this method could not recognize the format of zk@henghe66:2181. This led to an error. Since I need to use spark to read data from alluxio on yarn urgently, I rudely did the following:
package alluxio.hadoop;
public abstract class AbstractFileSystem ...{
...
/**Override this method and return null*/
@Override
public Token<?>[] addDelegationTokens(String renewer, Credentials credentials) throws IOException {
return null;
}
}
Does Alluxio consider a detailed implementation?
Looking forward to alluxio to Override. good luck!
Hello @thirdparty-core, regarding the use of Kerberos/Delegation token as you have mentioned, that is a feature which is only available in Alluxio enterprise edition. You can request a trial for the Alluxio enterprise edition, in which case the relevant documentation for setting up Alluxio with delegation tokens would be here.
As for your modification to manually override addDelegationTokens()
, all I can do is point you to the Apache source code.
DelegationTokenIssuer
interface, which is where addDelegationTokens()
comes from/**
* Given a renewer, add delegation tokens for issuer and it's child issuers
* to the Credentials object if it is not already present.
*
* Note: This method is not intended to be overridden. Issuers should
* implement getCanonicalService and getDelegationToken to ensure
* consistent token acquisition behavior.
*
* @param renewer the user allowed to renew the delegation tokens
* @param credentials cache in which to add new delegation tokens
* @return list of new delegation tokens
* @throws IOException thrown if IOException if an IO error occurs.
*/
default Token[] addDelegationTokens(
final String renewer, Credentials credentials) throws IOException {
if (credentials == null) {
credentials = new Credentials();
}
final List> tokens = new ArrayList<>();
collectDelegationTokens(this, renewer, credentials, tokens);
return tokens.toArray(new Token[tokens.size()]);
}
Specifically this comment:
Note: This method is not intended to be overridden. Issuers should implement getCanonicalService and getDelegationToken to ensure consistent token acquisition behavior.
Hello @ZhuTopher ,Thank you very much for your reply. I personally will try to request a trial. Good luck!
Alluxio Version: What version of Alluxio are you using? alluxio-2.6.1
Describe the bug spark on yarn spark version:2.4.6 hadoop.version:3.1.3
Spark with alluxio,I configured my spark-default.conf as follows according to the document:
spark-submit shell:
my spark code like this:
runtime logs:
What should I do in a situation like mine? Looking forward to your reply, thank you very much.