apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Bypass init-containers if `spark.jars` and `spark.files` is empty or … #348

Closed chenchun closed 7 years ago

chenchun commented 7 years ago

…only has local:// URIs

Fixes https://github.com/apache-spark-on-k8s/spark/issues/338

chenchun commented 7 years ago

@mccheah Would you like to take a look ?

ash211 commented 7 years ago

Thanks for the contribution @chenchun ! Definitely appreciate seeing more and more people using this project!

For this PR, at first glance it look like the scalastyle check is failing because one of the lines is too long, see http://spark-k8s-jenkins.pepperdata.org:8080/job/PR-spark-k8s-full-build/569/consoleFull#1143296497853a0453-9a85-4740-a867-694552c49a93 Would you mind please updating the PR to pass scalastyle? You can run it locally with ./dev/scalastyle

mccheah commented 7 years ago

It might be preferable to have the switch on if init-containers are needed or not to be handled inside either one of the existing modules or in a new module. See DriverInitContainerComponentsProvider and the modules it creates for some examples. We're trying to avoid the problem we had in the first implementation where Client.scala grew to be untenably large because of all the logic that was incrementally added over time.

chenchun commented 7 years ago

Thanks for the review. Haven't got chance to test it, but if this is the right direction? @mccheah

chenchun commented 7 years ago

Thanks for your comments. I think I started to get into Scala. @mccheah @ifilonenko PTAL.

chenchun commented 7 years ago

@ifilonenko Addressed your comments

ash211 commented 7 years ago

Thanks for the contribution @chenchun ! I'm looking forward to trying this out in my own clusters.

Out of curiosity, how much less time is spent in pod startup on your cluster now that the init container is bypassed?

chenchun commented 7 years ago

It's like 0.3s per container based on aufs.

ash211 commented 7 years ago

@ifilonenko so does this require a followup commit?

ifilonenko commented 7 years ago

@ash211 I refactored parts of the testing environment and accounted for pySparkFiles, in my merge in PR-351. I also took out the unnecessary line in the DriverInitComponentImpl which initialized a FileResolver that wasn't used. So this will not require a followup commit as I fixed the issues addressed above, its just for the sake of bookkeeping.