Open abtandon opened 8 years ago
Install the latest version of jupyter-scala by following its updated README. Then something along those lines should make this work (that's how I do it personally).
Thanks Alex, it (following Sprak section at https://github.com/alexarchambault/ammonium#spark) worked till I write sc on Jupyter Notebook, I get the following error:
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. (Yarn application has already ended! It might have been killed or unable to launch application master.)
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:123)
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
org.apache.spark.SparkContext.
Further, I tried to increase the memory by introducing another parameter (spark.executor.memory) in SparkConf, which leads to another exception.
Request for your help/suggestions.
Thanks, AT
I think I ran into similar issues when setupping my notebook install... A few points you may want to look at:
sparkHome
and hadoopConfDir
exist and have the right content (a spark distrib, hadoop conf files)?sparkAssembly
exist?sparkHome
?Thanks for the quick response, I did check all the variables as you have suggested.
The spark-shell returns res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@701e0b2e in response of "sc", which seems appropriate.
Rest, sparkHome, hadoopConfDir, sparkAssembly (relevant in my case is- s"$sparkHome/lib/spark-assembly-1.5.2-mapr-1603-hadoop2.7.0-mapr-1602.jar") and sparkHome seems to be in place.
Would appreciate your help here. Thanks.
Regards, AT
You're using a YARN cluster, right? There are ways to access the logs of application attempts, via its web UI or the command line (yarn
command). You should be able to get the application ID of the app launched by spark, then from it its logs or those of its attempts.
I performed all the steps mentioned under Spark label present @ https://github.com/alexarchambault/ammonium#spark I tried to print sc.master after/before running @ sparkConf.setMaster("yarn-client"); I got local[*] both the times. It might be the case that by just setting the 'setMaster' property in 'sparkConf', it won't get reflected on 'sc'. I think we need to set 'sc; again using this 'sparkconf', but I am not sure how this can be done as I tried to set 'sc' using different methods with no success.
Would highly appreciate your help here. Thanks.
Regards, AT
In addition to this, could you please confirm if theses steps can be done for Spark1.5.2 and Hadoop 2.7.0?
Would highly appreciate your help here. Thanks.
Regards, AT
By default when I follow the below steps, I get the sparkcontext on Jupyter (3.2.2) but the jobs are running on local mode (sc.master results in local):
1.load.ivy("com.github.alexarchambault" % "ammonite-spark_1.5_2.11.6" % "0.3.1-SNAPSHOT") 2.@transient val Spark = new ammonite.spark.Spark 3.import Spark.sc 4.Spark.start() 5.sc
Object here is to run jobs in yarn-cleint mode, so if I add below after step 2 above Spark.withConf(_ .setMaster("yarn-client") )
I get the following error (Jupyter notebook): org.apache.spark.SparkException: YARN mode not available ? org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2655) org.apache.spark.SparkContext.(SparkContext.scala:506)
ammonite.spark.Spark$SparkContext.(Spark.scala:229)
ammonite.spark.Spark.sc(Spark.scala:186)
ammonite.spark.Spark.start(Spark.scala:197)
cmd7$$user$$anonfun$1.apply$mcV$sp(Main.scala:58)
java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnScheduler
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
ammonite.interpreter.AddURLClassLoader.findClass(Classes.scala:42)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:278)
org.apache.spark.util.Utils$.classForName(Utils.scala:173)
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2649)
org.apache.spark.SparkContext.(SparkContext.scala:506)
ammonite.spark.Spark$SparkContext.(Spark.scala:229)
ammonite.spark.Spark.sc(Spark.scala:186)
ammonite.spark.Spark.start(Spark.scala:197)
cmd7$$user$$anonfun$1.apply$mcV$sp(Main.scala:58)
I would highly appreciate for any help or pointers here. Thanks.
Regards, AT