yarn-client support in jupyter-scala

abtandon commented 8 years ago

By default when I follow the below steps, I get the sparkcontext on Jupyter (3.2.2) but the jobs are running on local mode (sc.master results in local):

1.load.ivy("com.github.alexarchambault" % "ammonite-spark_1.5_2.11.6" % "0.3.1-SNAPSHOT") 2.@transient val Spark = new ammonite.spark.Spark 3.import Spark.sc 4.Spark.start() 5.sc

Object here is to run jobs in yarn-cleint mode, so if I add below after step 2 above Spark.withConf(_ .setMaster("yarn-client") )

I get the following error (Jupyter notebook): org.apache.spark.SparkException: YARN mode not available ? org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2655) org.apache.spark.SparkContext.(SparkContext.scala:506) ammonite.spark.Spark$SparkContext.(Spark.scala:229) ammonite.spark.Spark.sc(Spark.scala:186) ammonite.spark.Spark.start(Spark.scala:197) cmd7$$user$$anonfun$1.apply$mcV$sp(Main.scala:58) java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnScheduler java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) ammonite.interpreter.AddURLClassLoader.findClass(Classes.scala:42) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:278) org.apache.spark.util.Utils$.classForName(Utils.scala:173) org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2649) org.apache.spark.SparkContext.(SparkContext.scala:506) ammonite.spark.Spark$SparkContext.(Spark.scala:229) ammonite.spark.Spark.sc(Spark.scala:186) ammonite.spark.Spark.start(Spark.scala:197) cmd7$$user$$anonfun$1.apply$mcV$sp(Main.scala:58)

I would highly appreciate for any help or pointers here. Thanks.

Regards, AT

alexarchambault commented 8 years ago

Install the latest version of jupyter-scala by following its updated README. Then something along those lines should make this work (that's how I do it personally).

abtandon commented 8 years ago

Thanks Alex, it (following Sprak section at https://github.com/alexarchambault/ammonium#spark) worked till I write sc on Jupyter Notebook, I get the following error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. (Yarn application has already ended! It might have been killed or unable to launch application master.) org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:123) org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63) org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) org.apache.spark.SparkContext.(SparkContext.scala:523) ammonite.spark.Spark$SparkContext.(Spark.scala:240) ammonite.spark.Spark.sc(Spark.scala:197) cmd19$$user$$anonfun$1.apply(Main.scala:25) cmd19$$user$$anonfun$1.apply(Main.scala:24)

Further, I tried to increase the memory by introducing another parameter (spark.executor.memory) in SparkConf, which leads to another exception.

Request for your help/suggestions.

Thanks, AT

alexarchambault commented 8 years ago

I think I ran into similar issues when setupping my notebook install... A few points you may want to look at:

does spark-shell work fine for you?
do sparkHome and hadoopConfDir exist and have the right content (a spark distrib, hadoop conf files)?
does sparkAssembly exist?
do the spark and scala versions specified in the notebook match those of the spark distribution in sparkHome?
...

abtandon commented 8 years ago

Thanks for the quick response, I did check all the variables as you have suggested.

The spark-shell returns res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@701e0b2e in response of "sc", which seems appropriate.

Rest, sparkHome, hadoopConfDir, sparkAssembly (relevant in my case is- s"$sparkHome/lib/spark-assembly-1.5.2-mapr-1603-hadoop2.7.0-mapr-1602.jar") and sparkHome seems to be in place.

Would appreciate your help here. Thanks.

Regards, AT

alexarchambault commented 8 years ago

You're using a YARN cluster, right? There are ways to access the logs of application attempts, via its web UI or the command line (yarn command). You should be able to get the application ID of the app launched by spark, then from it its logs or those of its attempts.

abtandon commented 8 years ago

I performed all the steps mentioned under Spark label present @ https://github.com/alexarchambault/ammonium#spark I tried to print sc.master after/before running @ sparkConf.setMaster("yarn-client"); I got local[*] both the times. It might be the case that by just setting the 'setMaster' property in 'sparkConf', it won't get reflected on 'sc'. I think we need to set 'sc; again using this 'sparkconf', but I am not sure how this can be done as I tried to set 'sc' using different methods with no success.

Would highly appreciate your help here. Thanks.

Regards, AT

abtandon commented 8 years ago

In addition to this, could you please confirm if theses steps can be done for Spark1.5.2 and Hadoop 2.7.0?

Would highly appreciate your help here. Thanks.

Regards, AT

almond-sh / almond

yarn-client support in jupyter-scala #78