Closed sumansomasundar closed 6 years ago
I met this problem too, and didn't find a solution.
@sumansomasundar did you find solution to this issue ?
Can anyone provide a solution?
I have this script and it works: spark_command=(\ --conf "spark.hadoop.fs.defaultFS=hdfs://${HDFS_CLUSTER_NAME}" \ --conf "spark.hadoop.dfs.nameservices=${HDFS_CLUSTER_NAME}" \ --conf "spark.hadoop.dfs.client.failover.proxy.provider.${HDFS_CLUSTER_NAME}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" \ --conf "spark.hadoop.dfs.namenode.rpc-address.${HDFS_CLUSTER_NAME}.nn0=hdfs-namenode-0.hdfs-namenode.${HDFS_NAMESPACE}:${HDFS_PORT}" \ )
if [ "${HDFS_HA_ENABLED}" == "true" ] ; then spark_command+=(\ --conf "spark.hadoop.dfs.ha.namenodes.${HDFS_CLUSTER_NAME}=nn0,nn1" \ --conf "spark.hadoop.dfs.namenode.rpc-address.${HDFS_CLUSTER_NAME}.nn1=hdfs-namenode-1.hdfs-namenode.${HDFS_NAMESPACE}:${HDFS_PORT}" \ ) else spark_command+=(--conf "spark.hadoop.dfs.ha.namenodes.${HDFS_CLUSTER_NAME}=nn0") fi
@sumansomasundar
can you please let me know,
what should be the value in case of Two namenodes in
spark.hadoop.dfs.namenode.rpc-address.rpc-address.$HDFS_CLUSTER_NAME.nn1 and
spark.hadoop.dfs.namenode.rpc-address.rpc-address.$HDFS_CLUSTER_NAME.nn2
should the values be http://$HDFS_CLUSTER_NAME-0:8020 ?
I have the HDFS HA setup configured using the K8S charts. When I submit a Spark job with this setup, Spark is able to load the jars, but fails to initialize spark context with "java.net.UnknownHostException: hdfs-k8s" exception.
I tried to submit the same job with different cluster managers for Spark (YARN, local, standalone etc) and all of them run without any issue. Below is the stacktrace from the driver pod:
2018-07-12 09:01:27,996 [main] INFO org.apache.spark.SparkContext - Added JAR hdfs://hdfs-k8s/lib/zkclient-0.9.jar at hdfs://hdfs-k8s/lib/zkclient-0.9.jar with timestamp 1531386087996 2018-07-12 09:01:27,996 [main] INFO org.apache.spark.SparkContext - Added JAR hdfs://hdfs-k8s/lib/zookeeper-3.4.8.jar at hdfs://hdfs-k8s/lib/zookeeper-3.4.8.jar with timestamp 1531386087996 2018-07-12 09:01:28,363 [main] ERROR org.apache.spark.SparkContext - Error initializing SparkContext. java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-k8s at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1527)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1498)
at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkContext.(SparkContext.scala:461)
at com.oracle.miecs.enrich.kafka.KafkaEnrichApp$.main(KafkaEnrichApp.scala:238)
at com.oracle.miecs.enrich.kafka.KafkaEnrichApp.main(KafkaEnrichApp.scala)
Caused by: java.net.UnknownHostException: hdfs-k8s
... 20 more