Datanodes would be left with stale DNS records if namenode pods restart.

kimoonkim commented 6 years ago

Similar to #42. Even after a datanode successfully registered with namenodes, some namenodes may resart. The datanodes would be left with stale DNS entries in the local JVM cache.

We may have to tune the JVM cache so it expires soon enough. Or we may let the liveliness probe also assert on the other entry in the datanode JMX that has the mapping of namenode hostname and IP. i.e. If the mapping is stale, just let the datanode pods crash. For the second case, we may have to randomize the crash timepoints, so we don't lose all datanodes simultaneously.

juv commented 6 years ago

@kimoonkim I ran into this issue. How can I fix it? All of my datanodes now spam the logs with "2018-07-05 09:57:52,800 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop-hdfs-nn-1.hadoop-hdfs-nn.my-namespace.svc.cluster.local:9000"

juv commented 6 years ago

@kimoonkim I tried to delete the namenode pod that was selected as active nn, but the other NN could not be elected as the leader. The other namenode tried to get elected as leader several times, but all those tries fail with this error. Why is it putting out local host is: (unknown);?

2018-07-05 10:06:55,700 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setSafeMode from 172.102.3.63:54950 Call#0 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby
2018-07-05 10:07:22,646 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode hadoop-hdfs-nn-1.hadoop-hdfs-nn.my-namespace.svc.cluster.local:9000
2018-07-05 10:07:22,646 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a roll of the active NN
java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "hadoop-hdfs-nn-1.hadoop-hdfs-nn.my-namespace.svc.cluster.local":9000; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:744)
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:409)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1518)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:273)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:315)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
Caused by: java.net.UnknownHostException
... 14 more

Output of /etc/hosts:

# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.101.3.207   hadoop-hdfs-nn-0.hadoop-hdfs-nn.my-namespace.svc.cluster.local    hadoop-hdfs-nn-0

Output of $HOSTNAME:

# echo $HOSTNAME
hadoop-hdfs-nn-0

jpiper commented 6 years ago

Could we use a ClusterIP service for the NameNode so that we have a non-changing IP address?

jpiper commented 6 years ago

You could utilise this sts feature released in 1.9

StatefulSet controller will create a label for each Pod in a StatefulSet. The label is named statefulset.kubernetes.io/pod-name and it is equal to the name of the Pod. This allows users to create a Service per Pod to expose a connection to individual Pods.

https://github.com/kubernetes/kubernetes/pull/55329

jpiper commented 6 years ago

I can confirm the above suggestion works as expected. If you use a Service-per-pod in front of both of the namenodes and journalnodes with the selector for that namenode then they have fixed IP addresses so the DNS caching won't cause issues (make sure to change the DNS names in the *-site.xml to reflect the new naming scheme, obviously)

e.g.

apiVersion: v1
kind: Service
metadata:
  name: hdfs-namenode-0
  labels:
    app: hdfs-namenode
    chart: hdfs-namenode-k8s-0.1.0
    release: hdfs
spec:
  ports:
  - port: 8020
    name: fs
  - port: 50070
    name: http
  selector:
    app: hdfs-namenode
    release: hdfs
    statefulset.kubernetes.io/pod-name: hdfs-namenode-0

jpiper commented 6 years ago

(I wonder if this breaks data locality though)

apache-spark-on-k8s / kubernetes-HDFS

Datanodes would be left with stale DNS records if namenode pods restart. #48