Open ALiBaBa-Jimmy opened 6 years ago
@kimoonkim
Hi @ALiBaBa-Jimmy, thanks for trying out k8s HDFS and sorry about the trouble you went through.
This seems like kube-dns issue. Do you know if your cluster has healthy kube-dns? If you're not clear, you may want to try steps in https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#does-the-service-work-by-dns
Also, can you post the exact command line you used to launch the helm chart?
Thanks.
@kimoonkim I met exactly the same issue and I am pretty sure my dns is correct. my k8s version is 1.12, and the dns is core-dns.
I have the exact same issue
I encountered this same issue. In my case it was caused by the cluster domainname not being the default/common "cluster", which appears to be the expectation in file charts/hdfs-k8s/templates/_helpers.tpl
. My "fix" replaced one hardcoded value another:
--- a/charts/hdfs-k8s/templates/_helpers.tpl
+++ b/charts/hdfs-k8s/templates/_helpers.tpl
@@ -163,7 +163,7 @@ The HDFS config file should specify FQDN of services. Otherwise, Kerberos
login may fail.
*/}}
{{- define "svc-domain" -}}
-{{- printf "%s.svc.cluster.local" .Release.Namespace -}}
+{{- printf "%s.svc.aiscluster.local" .Release.Namespace -}}
{{- end -}}
The incorrect value contributes to the hdfs-config configmap and the default custom core-site.xml
it delivers (which run.sh
unconditionally copies over any other tweaks to /etc/hadoop/core-site.xml
)
When I use helm install the namenode accroding to your docs
Errors log appear follow, and namode restart again:
java.io.IOException: Failed on local exception: java.net.SocketException: Unresolved address; Host Details : local host is: "hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local"; destination host is: (unknown):0; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) at org.apache.hadoop.ipc.Server.bind(Server.java:425) at org.apache.hadoop.ipc.Server$Listener.(Server.java:574)
at org.apache.hadoop.ipc.Server.(Server.java:2215)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:783)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:344)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:673)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
Caused by: java.net.SocketException: Unresolved address
at sun.nio.ch.Net.translateToSocketException(Net.java:131)
at sun.nio.ch.Net.translateException(Net.java:157)
at sun.nio.ch.Net.translateException(Net.java:163)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
at org.apache.hadoop.ipc.Server.bind(Server.java:408)
... 13 more
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
... 14 more
hdfs-namenode-0 0/1 CrashLoopBackOff 22 9h 10.196.36.165 10.196.36.165 hdfs-namenode-1 0/1 CrashLoopBackOff 22 9h 10.196.36.162 10.196.36.162
this is my hosts on my node machine: [wangdanfeng5@A01-R20-I36-165-0964488 ~]$ cat /etc/hosts
#127.0.0.1 A01-R20-I36-165-0964488.JD.LOCAL localhost.localdomain localhost 127.0.0.1 localhost.localdomain localhost 10.196.36.162 A01-R20-I36-162-0964483.JD.LOCAL 10.196.36.165 A01-R20-I36-165-0964488.JD.LOCAL
Could you give me some advice about this ?