I have tried running WordCount example by copying hdfs-site.xml and core-site.xml from hdfs datanode and updating spark-env.sh.
Unfortunately, it was not the right way and job failed with Path does not exist.
I looked at the Dockerfile for spark image but it seems like conf folder is not being copied.
What's the correct way of setting up Spark to maximize data locality of HDFS?
I have Spark submitting a job through k8s. It works perfectly with Spark-Pi example code (Great job!)
I have also setup hdfs with
kubernetes-HDFS
. I verified that it works fine and I was able to hit the namenode with port 50070.However, as you know, in order to allow Spark to use hdfs as default fs, I have to provide
HADOOP_CONF_DIR
throughspark-env.sh
(https://spark.apache.org/docs/latest/configuration.html#inheriting-hadoop-cluster-configuration)I have tried running WordCount example by copying
hdfs-site.xml
andcore-site.xml
from hdfs datanode and updatingspark-env.sh
. Unfortunately, it was not the right way and job failed withPath does not exist
.I looked at the Dockerfile for spark image but it seems like
conf
folder is not being copied.What's the correct way of setting up Spark to maximize data locality of HDFS?