Closed mccheah closed 7 years ago
When we tried to use this small files feature in our app, we saw this exception in the submission client:
Exception in thread "main" java.io.FileNotFoundException: file:/path/to/logback.xml (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.spark_project.guava.io.Files$FileByteSource.openStream(Files.java:124)
at org.spark_project.guava.io.Files$FileByteSource.openStream(Files.java:114)
at org.spark_project.guava.io.ByteSource.read(ByteSource.java:220)
at org.spark_project.guava.io.Files$FileByteSource.read(Files.java:141)
at org.spark_project.guava.io.Files.toByteArray(Files.java:355)
at org.apache.spark.deploy.kubernetes.submit.submitsteps.MountSmallLocalFilesStep$$anonfun$3.apply(MountSmallLocalFilesStep.scala:46)
at org.apache.spark.deploy.kubernetes.submit.submitsteps.MountSmallLocalFilesStep$$anonfun$3.apply(MountSmallLocalFilesStep.scala:45)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.deploy.kubernetes.submit.submitsteps.MountSmallLocalFilesStep.configureDriver(MountSmallLocalFilesStep.scala:45)
at org.apache.spark.deploy.kubernetes.submit.Client$$anonfun$run$1.apply(Client.scala:93)
at org.apache.spark.deploy.kubernetes.submit.Client$$anonfun$run$1.apply(Client.scala:92)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.deploy.kubernetes.submit.Client.run(Client.scala:92)
at org.apache.spark.deploy.kubernetes.submit.Client$$anonfun$run$5.apply(Client.scala:189)
at org.apache.spark.deploy.kubernetes.submit.Client$$anonfun$run$5.apply(Client.scala:182)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2566)
at org.apache.spark.deploy.kubernetes.submit.Client$.run(Client.scala:182)
at org.apache.spark.deploy.kubernetes.submit.Client$.main(Client.scala:202)
at org.apache.spark.deploy.kubernetes.submit.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:772)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:183)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:208)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:122)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Thanks for elaborating @ash211. It all seems confined to that submission step, which seems like a good idea. We should still look at making the init container terminate quicker perhaps? While the small files fix is expedient, I think it might not be usable for the general case given the limitations.
rerun integration tests please
Qualitatively, running init containers on our newest clusters (1.7.2 based I think) feels faster but I don't have benchmarks to quantify it
This happens because SparkSubmit
translates all paths into URIs before passing them along to the submission client implementation. We were just providing paths, but they arrive at the small files step with the file://
scheme.
@mccheah, can you add a bit more of a description to the PR here? Not sure what the previous issue was and what this solves.