apache-spark-on-k8s / kubernetes-HDFS

Repository holding configuration files for running an HDFS cluster in Kubernetes
Apache License 2.0
397 stars 185 forks source link

Spark on Kubernetes does not pick up HDFS HA configs #55

Closed sumansomasundar closed 6 years ago

sumansomasundar commented 6 years ago

I have the HDFS HA setup configured using the K8S charts. When I submit a Spark job with this setup, Spark is able to load the jars, but fails to initialize spark context with "java.net.UnknownHostException: hdfs-k8s" exception.

I tried to submit the same job with different cluster managers for Spark (YARN, local, standalone etc) and all of them run without any issue. Below is the stacktrace from the driver pod:

2018-07-12 09:01:27,996 [main] INFO org.apache.spark.SparkContext - Added JAR hdfs://hdfs-k8s/lib/zkclient-0.9.jar at hdfs://hdfs-k8s/lib/zkclient-0.9.jar with timestamp 1531386087996 2018-07-12 09:01:27,996 [main] INFO org.apache.spark.SparkContext - Added JAR hdfs://hdfs-k8s/lib/zookeeper-3.4.8.jar at hdfs://hdfs-k8s/lib/zookeeper-3.4.8.jar with timestamp 1531386087996 2018-07-12 09:01:28,363 [main] ERROR org.apache.spark.SparkContext - Error initializing SparkContext. java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-k8s at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1527) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1498) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.SparkContext.(SparkContext.scala:461) at com.oracle.miecs.enrich.kafka.KafkaEnrichApp$.main(KafkaEnrichApp.scala:238) at com.oracle.miecs.enrich.kafka.KafkaEnrichApp.main(KafkaEnrichApp.scala) Caused by: java.net.UnknownHostException: hdfs-k8s ... 20 more

hrsjw1 commented 5 years ago

I met this problem too, and didn't find a solution.

sakthig commented 5 years ago

@sumansomasundar did you find solution to this issue ?

dennischin commented 5 years ago

Can anyone provide a solution?

sumansomasundar commented 5 years ago

I have this script and it works: spark_command=(\ --conf "spark.hadoop.fs.defaultFS=hdfs://${HDFS_CLUSTER_NAME}" \ --conf "spark.hadoop.dfs.nameservices=${HDFS_CLUSTER_NAME}" \ --conf "spark.hadoop.dfs.client.failover.proxy.provider.${HDFS_CLUSTER_NAME}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" \ --conf "spark.hadoop.dfs.namenode.rpc-address.${HDFS_CLUSTER_NAME}.nn0=hdfs-namenode-0.hdfs-namenode.${HDFS_NAMESPACE}:${HDFS_PORT}" \ )

HDFS configs for HA/Non-HA

if [ "${HDFS_HA_ENABLED}" == "true" ] ; then spark_command+=(\ --conf "spark.hadoop.dfs.ha.namenodes.${HDFS_CLUSTER_NAME}=nn0,nn1" \ --conf "spark.hadoop.dfs.namenode.rpc-address.${HDFS_CLUSTER_NAME}.nn1=hdfs-namenode-1.hdfs-namenode.${HDFS_NAMESPACE}:${HDFS_PORT}" \ ) else spark_command+=(--conf "spark.hadoop.dfs.ha.namenodes.${HDFS_CLUSTER_NAME}=nn0") fi

abhishekkarigar commented 2 years ago

@sumansomasundar can you please let me know, what should be the value in case of Two namenodes in
spark.hadoop.dfs.namenode.rpc-address.rpc-address.$HDFS_CLUSTER_NAME.nn1 and spark.hadoop.dfs.namenode.rpc-address.rpc-address.$HDFS_CLUSTER_NAME.nn2

should the values be http://$HDFS_CLUSTER_NAME-0:8020 ?