apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Use a pre-installed Minikube instance for integration tests. #521

Open mccheah opened 7 years ago

mccheah commented 7 years ago

This changes the integration test workflow to assume that Minikube is already installed on the testing machine. Previously, the integration tests downloaded Minikube in the build cycle, and started/stopped the Minikube VM on every test execution. However, this made it such that multiple integration tests cannot run concurrently.

This commit allows multiple tests to share a single Minikube instance, and also requires users that run integration tests to have Minikube pre-installed. If the minikube instance has enough resources, multiple tests can run against it at the same time. Each test needs to use its own set of Docker images, so the docker image builder now tags images uniquely on every test execution.

mccheah commented 7 years ago

This build won't pass until Minikube is pre-installed on our Jenkins node. Conversely, if we set up Minikube on the Jenkins node, all outstanding PRs will need to rebase on top of this commit once it merges in to pass those builds. i'm not sure what the best way forward is with regards to sequencing these changes.

This change allows tests to run on the Riselab Jenkins environment as well.

@ash211 @ifilonenko for review, @ssuchter @shaneknapp for infrastructure considerations.

mccheah commented 7 years ago

We could but I'd prefer to avoid having to maintain two code paths if possible. If we're sending this upstream, it would be best to make the system simple.

foxish commented 6 years ago

Is this waiting for any more changes? Do we need the pepperdata build system also to install minikube?

mccheah commented 6 years ago

Still ironing out some errors in the Riselab system, but yes we will need Pepperdata Jenkins to install minikube as well. Once we merge this into branch-2.2-kubernetes, ALL open PRs need to rebase on top of this change to pass CI... and to not delete Minikube accidentally.

echarles commented 6 years ago

Great to have the option to run concurrent integration tests on an already existing minikube instance with different tags. I haven't read the proposed changes, but how do you create the image tags : rebuilding or simply tagging an initial build?

However I'd like to second @ssuchter about the interest in keeping the previous behavior: a CI server should not have minikube pre-installed. I think it will be harder to ask Apache infra to have this on their servers, rather than letting maven download it.

My vote would go for concurrent run (with tags) on a downloaded minikube, but an option to connect to an existing one, so a mix of existing and proposed behavior.

echarles commented 6 years ago

I have run the integration test on this branch and get a failure, which I suppose, is a random one as described in #571.

- Run PySpark Job on file from SUBMITTER with --py-files
- Run PySpark Job on file from CONTAINER with spark.jar defined
- Run SparkR Job on file locally
- Run SparkR Job on file from SUBMITTER
- Simple submission test with the resource staging server.
- Enable SSL on the resource staging server
- Use container-local resources without the resource staging server
- Dynamic executor scaling basic test
- Use remote resources without the resource staging server.
- Mix remote resources with submitted ones.
- Use key and certificate PEM files for TLS.
- Use client key and client cert file when requesting executors
- Added files should be placed in the driver's working directory. *** FAILED ***
  java.net.ConnectException: Failed to connect to /192.168.99.100:30941
  at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:225)
  at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
  at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
  at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
  at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
  at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
  at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
  ...
  Cause: java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:589)
  at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
  at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:223)
  at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
  at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
  ...
- Setting JVM options on the driver and executors with spaces.
- Submit small local files without the resource staging server.
- Use a very long application name.
...