apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Resource staging server: java.net.SocketTimeoutException: connect timed out #615

Closed gridcellcoder closed 6 years ago

gridcellcoder commented 6 years ago

I am following the guide here:

https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html

on GCE Kubernetes with Kubectl

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.2-gke.1", GitCommit:"4ce7af72d8d343ea2f7680348852db641ff573af", GitTreeState:"clean", BuildDate:"2018-01-31T22:30:55Z", GoVersion:"go1.9.2b4", Compiler:"gc", Platform:"linux/amd64"}

Master version
1.9.2-gke.1

I have the RSS up and running

kubectl get services
NAME                                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
kubernetes                          ClusterIP   10.3.240.1     <none>        443/TCP             1h
spark-pi-1518520538054-driver-svc   ClusterIP   None           <none>        7078/TCP,7079/TCP   1h
spark-pi-1518520958130-driver-svc   ClusterIP   None           <none>        7078/TCP,7079/TCP   54m
spark-resource-staging-service      NodePort    10.3.250.165   <none>        10000:31000/TCP     42m

I can get the SparkPi application to work but when I try to use the resource staging server with local dependencies:

bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://<ip-of-my-cluster>:443 \
  --kubernetes-namespace default \
  --conf spark.kubernetes.submission.waitAppCompletion=true \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://<address-of-any-cluster-node>:31000 \
  ./examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

with <address-of-any-cluster-node> I tried : <ip-of-my-cluster> which is the Kubernetes API endpoint the NodePort IP 10.3.250.165 , the IP of the pod 10.0.0.X all of these fail. the private IP (Primary internal IP) and (External IP) of the node VM on GCE

The documentation states:

<address-of-any-cluster-node> what should this be? Or is it some other issue with discovery/setup/kubernetes version?

Exception in thread "main" java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
    at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:223)
    at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at retrofit2.OkHttpCall.execute(OkHttpCall.java:174)
    at org.apache.spark.deploy.k8s.submit.SubmittedDependencyUploaderImpl.getTypedResponseResult(SubmittedDependencyUploaderImpl.scala:101)
    at org.apache.spark.deploy.k8s.submit.SubmittedDependencyUploaderImpl.doUpload(SubmittedDependencyUploaderImpl.scala:97)
    at org.apache.spark.deploy.k8s.submit.SubmittedDependencyUploaderImpl.uploadJars(SubmittedDependencyUploaderImpl.scala:70)
    at org.apache.spark.deploy.k8s.submit.submitsteps.initcontainer.SubmittedResourcesInitContainerConfigurationStep.configureInitContainer(SubmittedResourcesInitContainerConfigurationStep.scala:48)
    at org.apache.spark.deploy.k8s.submit.submitsteps.InitContainerBootstrapStep$$anonfun$configureDriver$1.apply(InitContainerBootstrapStep.scala:43)
    at org.apache.spark.deploy.k8s.submit.submitsteps.InitContainerBootstrapStep$$anonfun$configureDriver$1.apply(InitContainerBootstrapStep.scala:42)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.deploy.k8s.submit.submitsteps.InitContainerBootstrapStep.configureDriver(InitContainerBootstrapStep.scala:42)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(Client.scala:95)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(Client.scala:94)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.deploy.k8s.submit.Client.run(Client.scala:94)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:191)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.apply(Client.scala:184)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2551)
    at org.apache.spark.deploy.k8s.submit.Client$.run(Client.scala:184)
    at org.apache.spark.deploy.k8s.submit.Client$.main(Client.scala:204)
    at org.apache.spark.deploy.k8s.submit.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:786)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
gridcellcoder commented 6 years ago

Turns out it was a firewall issue.

You need to allow port 31000 or whatever is defined in the yml from your local machine to the public IP of the node that is running the RSS.