JahstreetOrg / spark-on-kubernetes-helm

Spark on Kubernetes infrastructure Helm charts repo
Apache License 2.0
198 stars 76 forks source link

Unable to create spark session #41

Closed KamalGalrani closed 3 years ago

KamalGalrani commented 3 years ago

I was able to setup Livy using the helm chart, but when I create a session it fails. I am using the default configuration with minikube

Create session payload

{
    "kind": "pyspark",
    "name": "test-session1234",
    "conf": {
      "spark.kubernetes.namespace": "livy"
    }
}
20/09/25 04:00:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  for kind: [Pod]  with name: [null]  in namespace: [livy]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
    at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
    at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketException: Broken pipe (Write failed)
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
    at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
    at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
    at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
    at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
    at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
    at okio.Okio$1.write(Okio.java:79)
    at okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
    at okio.RealBufferedSink.flush(RealBufferedSink.java:224)
    at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203)
    at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515)
    at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505)
    at okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298)
    at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:287)
    at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:168)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
    at okhttp3.RealCall.execute(RealCall.java:92)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:819)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:334)
    ... 17 more
20/09/25 04:01:00 INFO ShutdownHookManager: Shutdown hook called
20/09/25 04:01:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-343d41df-d58c-4ed4-8a03-2eabbc21da1d

Kubernetes Diagnostics: 
Operation: [list]  for kind: [Pod]  with name: [null]  in namespace: [null]  failed.
jahstreet commented 3 years ago

I believe this relates to the K8s API version incompatibility with the K8s Client. Which K8s API version you use with the Minikube?

KamalGalrani commented 3 years ago
singularity@Kamal-Omen:~$ minikube version
minikube version: v1.13.0
commit: 0c5e9de4ca6f9c55147ae7f90af97eff5befef5f-dirty
singularity@Kamal-Omen:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.7-eks-bffbac", GitCommit:"bffbacfd13a805a12d10ccc0ca26205ae1ca76e9", GitTreeState:"clean", BuildDate:"2020-07-08T18:30:00Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:23:04Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}

Does this give the K8s API version

Also, in the other thread you mentioned about driver logs and networking issue. Could you please help me with extracting driver log if the above doesn't solve the issue. And can you elaborate what do you mean by networking issue?

jahstreet commented 3 years ago

First about this one: current Helm chart runs Spark 2.4.5 and Livy with fabric8 Java K8s client 4.6.1 which is limited by K8s API 1.15.3. Your K8s API version on Minikube is Major:"1", Minor:"19" or 1.19. To run the desired K8s API version with Minikube you can execute: minikube start --kubernetes-version=1.15.0 .... I'm currently working on Spark 3 support which is going to ease this limitation.

Second: to extract Spark Driver logs you can execute kubectl logs <spark-driver-pod-name>. By networking issue I mean that there can be non-stable network, which occasionally may fail requests within it. Network is backed by the hardware and software which can work with problems. Depending on your environment there can be more or less of such issues. Also I cannot say exactly if it is so, it's just the assumption to check with the network engineers and try to trace the requests if you have some APM Metrics and request tracing setup in the cluster.

KamalGalrani commented 3 years ago
  1. I'll downgrade K8s. Thanks
  2. I'll include these logs if downgrading doesn't help
  3. I don't think network is an issue here because everything is running on a single machine for testing with docker driver for minikube
jahstreet commented 3 years ago

Ahh, I see, then let's wait for the logs... Hope you get it solved!

KamalGalrani commented 3 years ago

Thanks Downgrading K8s worked