Open juv opened 6 years ago
You should configure a client to work with HDFS in HA mode manually. You can get params from hdfs-site.xml
kubectl describe configmap hdfs-config
Some apps can read the params from hdfs-config.xml
Just put hdfs-config.xml
to /etc/hadoop/conf
Some apps can be configured using flags. I use next syntax to launch spark(2.3.1) jobs on k8s:
spark-submit ... \
--conf spark.hadoop.fs.defaultFS="hdfs://hdfs-k8s" \
--conf spark.hadoop.fs.default.name="hdfs://hdfs-k8s" \
--conf spark.hadoop.dfs.nameservices="hdfs-k8s" \
--conf spark.hadoop.dfs.ha.namenodes.hdfs-k8s="nn0,nn1" \
--conf spark.hadoop.dfs.namenode.rpc-address.hdfs-k8s.nn0="hdfs-namenode-0.hdfs-namenode.default:8020" \
--conf spark.hadoop.dfs.namenode.rpc-address.hdfs-k8s.nn1="hdfs-namenode-1.hdfs-namenode.default:8020" \
--conf spark.hadoop.dfs.client.failover.proxy.provider.hdfs-k8s="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" \
Homegrown apps can be configured via code changes https://stackoverflow.com/a/35911455
I did the same mounting secrets with core-site and hdfs-site into Spark app. The options you gave didn't work.
We have similar idea with @juv and I am implementing it. Basically We create a ZooKeeper watcher to get notification on the event of ZooKeeper znode /hadoop-ha/ns/ActiveStandbyElectorLock. Then we will get the info about which NameNode is active. We label the active NameNode so that the k8s service only route requests to active NameNode. In this way, client don't need to retry to figure out which is active. I would like to seek feedback on this solution from experts. Thank you!
@maver1ck were you able to get this working with configmaps? I am also trying with configmaps and I am failing as well.
My hdfs-ha cluster name is not resolvable even though all my info is properly mounted in configmaps and the core-site.xml and hdfs-site.xml files are properly mounted.
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-k8s
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.
@dennischin Can you connect to hdfs-client pod and traverse files on hdfs?
kubectl get pod -l=app=hdfs-client
kubectl exec -ti <podname> bash
hadoop fs -ls /
hadoop fs -put /local/file /
You can find configuration on hdfs-client pod /etc/hadoop-custom-conf/
@vvbogdanov87 , yes i can interact with hdfs (outside of spark) without issue.
vim spark-evn.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://mycluster/spark-job-log" YARN_CONF_DIR=/opt/hadoop-2.7.2-ha/etc/hadoop
I've set up NameNode HA. A problem is that my service will route to either active namenode or standby namenode. For example, I have a Spark History server running with parameter SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop-hdfs-nn.my-namespace.svc:9000/shared/spark-logs". The following error
Operation category READ is not supported in state standby
occurs:I guess that the Spark History Server has accessed the k8s service and got routed to the standby namenode instead of the active namenode. Have you thought about this problem @kimoonkim? I have an idea to solve this:
hadoop-hdfs-active-nn
Then, let all clients connect to the
hadoop-hdfs-active-nn
service instead ofhadoop-hdfs-nn
. I am not sure if it is possible to set the label of a service from within a pod though...