Open apurvann opened 7 years ago
I checked the pre-requisites. I have all the permissions with the default service account. Also, kube-dns is working perfectly fine. I have used https://github.com/kubernetes-incubator/kubespray to spin up my cluster. Do I need to provide some sort of token manually to make it run?
I hit the same problem on my Kubernetes 1.7.4 environment.
I see that KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc" is defined in https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/constants.scala and used directly when creating k8s client, https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala I wonder if we should allow configuration be passed using --conf to change from default value.
Thanks for the bug report. Does the kubernetes
svc not exist in your cases in the default namespace, or is it that kube-dns doesn't resolve it?
In my case, it is not verified error:
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc not verified: certificate: sha256/OgiOei5ibrXB0RwLonCjihq+GuPkGj5BqNkWC0K1u2k= DN: CN=172.21.0.1 subjectAltNames: [172.21.0.1, 10.176.215.15, 169.46.7.238]
I realized that I hit this issue: https://github.com/apache-spark-on-k8s/spark/issues/543
Found this https://github.com/fabric8io/fabric8/issues/6229.
And this https://github.com/spring-cloud-incubator/spring-cloud-kubernetes/issues/137. Seems there's a work around by setting environment variable KUBERNETES_MASTER
to kubernetes.default
. Can you try setting it in the driver pod? You can use --conf spark.kubernetes.driverEnv.KUBERNETES_MASTER=kubernetes.default
.
@liyinan926. Your suggestion did not work. The code I linked above showing the Kubernetes Client is created from the constant KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc" directly. Yes, if it can honor sparkConf.get, it will be much better.
val kubernetesClient = SparkKubernetesClientFactory.createKubernetesClient( KUBERNETES_MASTER_INTERNAL_URL, Some(sparkConf.get(KUBERNETES_NAMESPACE)), APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX, sparkConf, Some(new File(Config.KUBERNETES_SERVICE_ACCOUNT_TOKEN_PATH)), Some(new File(Config.KUBERNETES_SERVICE_ACCOUNT_CA_CRT_PATH)))
@yanglei99 ,Yes. @liyinan926 's suggestion did not work out. The KUBERNETES_MASTER_INTERNAL_URL is hardcoded for some convenience I think, but It doesn't work in my environment also. As far as I'm concerned, using a sparkConf.get method like "spark.master" is much better. I tried this way and it works well.
@fanzhen, first I assume you make it to work by changing the code above and recompile. Second, I do like the internal URL approach as it is used in pod, than the spark-submit client. I wonder if a simple fix is to use "kubernetes.default.svc.cluster.local" as default, while allowing a spark config overwrite.
As @yanglei99 said, the URL used by the pods to connect to the API server is internal, while spark.master
is the external URL used by the submission client.
@apurva3000 @yanglei99 @fanzhen what is the DNS name of the Kubernetes service in your clusters?
Hm, looks like even if we don't specify the master when creating the client, the fabric8 kubernetes-client
will use https://kubernetes.default.svc
by default. See https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L113. So a solution could be to not set the hard-coded master in the Config
object used when creating the client. This gives users the option to use environment variable KUBERNETES_MASTER
to set the correct value. @foxish @ash211 @mccheah @kimoonkim @erikerlandson.
@liyinan926, "kubernetes.default.svc" is hardcoded today to create Kubernetes Client. Code links are in above discussions. In my environment it is resolved to "kubernetes.default.svc.cluster.local".
@liyinan926 , I agree with the solution you figured out. Giving users the option to use environment variable KUBERNETES_MASTER can be more flexible to different environments.
I believe it should be kubernetes.default.svc.cluster.local
looking at the following
kubectl exec -it busybox -- nslookup kubernetes.default.svc.cluster.local
Server: 10.233.0.3
Address 1: 10.233.0.3 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default.svc.cluster.local
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
And
kubectl exec -it busybox -- cat /etc/resolv.conf
nameserver 10.233.0.3
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
So, I changed the code and built it again, so that it gets the URL from the sparkConf. However, am still hitting the same error:
Caused by: java.net.UnknownHostException: kubernetes.default.svc.cluster.local: Try again
I realized that I can use the ip addresses, both the master url (https://ip:port) and the dns resolved ip (https://172.21.0.1) as the internal master URL. However, I can not use any of the host names. Besides, https://kubernetes.default.svc (in the current code), https://kubernetes.default.svc.cluster.local, I have also tried https://kubernetes, none worked.
@yanglei99 are you able to use the kubernetes master URL as the internal URL i.e. <k8s-apiserver-host>:<k8s-apiserver-port>
?
I tried http://localhost:8080 , 127.0.0.1:8080 and also the DNS resolved IP. Now I keep getting java.net.ConnectException: Connection refused (Connection refused)
errors
@liyinan926 @mccheah @foxish
@apurva3000, I am not sure how you use the "master URL". First, in spark-submit, --master can use both proxy URL and the master URL, however as the driver code is running in pod, it can only use the master URL, as it does not have the submission environment's proxy setup. Second, when spark-submit is using master URL, in my environment, I also need to set spark.kubernetes.authenticate.submission.oauthToken. Then, as you mentioned using "http://localhost:8080" is it what is returned from "kubectl cluster-info" (e.g. Kubernetes master is running at http://localhost:8080") ? Help these helps.
@yanglei99 Yes the kubectl cluster-info
gives me http://localhost:8080 and I use it as my --master, as instructed in the official docs. However, using this or 127.0.0.1 with the port leads to the connection refused error.
@apurva3000, can you share the new failure stack trace? One issue I see is , as the ip is localhost or 127.0.0.1, I am not sure driver pod will resolve it to the API server... One thing you can try is to use "http://10.233.0.1" for driver pod's KUBERNETES_MASTER_INTERNAL_URL as it is listed as the resolved IP address for your "kubernetes.default.svc.cluster.local". I assume it is using http protocol as your cluster-info returns http. However if that does not work try http://10.233.0.1. You will still need to use "--master k8s://http://localhost:8080" as it is for the submission side.
@liyinan926 Still doesn't work and here is what the stacktrace looks like:
2017-12-07 08:38:48 ERROR KubernetesClusterSchedulerBackend:91 - Executor cannot find driver pod.
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-1512635912762-driver] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:74)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:72)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:142)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2764)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:223)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall.execute(RealCall.java:69)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
... 18 more
2017-12-07 08:38:48 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Executor cannot find driver pod
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:78)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:72)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:142)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2764)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-1512635912762-driver] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:74)
... 16 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:223)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall.execute(RealCall.java:69)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
... 18 more
2017-12-07 08:38:48 INFO AbstractConnector:310 - Stopped Spark@482e5524{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2017-12-07 08:38:48 INFO SparkUI:54 - Stopped Spark web UI at http://spark-pi-1512635912762-driver-svc.default.svc.cluster.local:4040
2017-12-07 08:38:48 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2017-12-07 08:38:48 INFO MemoryStore:54 - MemoryStore cleared
2017-12-07 08:38:48 INFO BlockManager:54 - BlockManager stopped
2017-12-07 08:38:48 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2017-12-07 08:38:48 WARN MetricsSystem:66 - Stopping a MetricsSystem that is not running
2017-12-07 08:38:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2017-12-07 08:38:48 INFO SparkContext:54 - Successfully stopped SparkContext
Traceback (most recent call last):
File "/opt/spark/examples/src/main/python/pi.py", line 32, in <module>
.appName("PythonPi")\
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 334, in getOrCreate
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 273, in _initialize_context
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Executor cannot find driver pod
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:78)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:72)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:142)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2764)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-1512635912762-driver] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.liftedTree1$1(KubernetesClusterSchedulerBackend.scala:74)
... 16 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:223)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall.execute(RealCall.java:69)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
... 18 more
Am starting to wonder a few things now, when i do kubectl cluster-info
, I get KubeDNS is running at http://localhost:8080/api/v1/namespaces/kube-system/services/kube-dns/proxy
And when I try to do a curl at that address, I get this:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "no endpoints available for service \"kube-dns\"",
"reason": "ServiceUnavailable",
"code": 503
}
Do you know if this is normal, or do you also see a similar kind of message when you try to curl ?
@apurva3000 it seems there's some problem with the service kube-dns
in your cluster. Can you run kubectl get ep kube-dns --namespace=kube-system
and see what the output looks like?
I was able to get the endpoints as well, now I have destroyed that cluster and created a new one. There might have been some problems with kube-dns there for sure because now everything works
Cool. The kube-dns add-on is required. I think we need to add a check to the submission process and fail the submission if the add-on does not exist or is not healthy.
I am trying to run the SparkPi command on my kubernetes cluster which has kubernetes 1.8 and following the official documentation. The command for running looks like this:
However, I am running into this error whenever I try to run the SparkPi example.
Anyone ran into this error before?