elbamos / Zeppelin-With-R

Mirror of Apache Zeppelin (Incubating)
Apache License 2.0
45 stars 24 forks source link

Building Zeppelin 0.6.0 with R on AWS EMR Cluster failed #17

Open zenonlpc opened 8 years ago

zenonlpc commented 8 years ago

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

elbamos commented 8 years ago

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

zenonlpc commented 8 years ago

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to zeppelin.

How can I set the spark_home to use the external spark? In zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862

zenonlpc commented 8 years ago

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to zeppelin.

How can I set the spark_home to use the external spark? In zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862

elbamos commented 8 years ago

I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to zeppelin.

How can I set the spark_home to use the external spark? In zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

zenonlpc commented 8 years ago

Hello Elbamos

I already get the regular spark interpreter working before here is the maven complie command I used :

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java:445) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to zeppelin.

How can I set the spark_home to use the external spark? In zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862

— You are receiving this because you commented.

Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125

zenonlpc commented 8 years ago

Hello Elbamos

I misunderstood you, I will try make R working with spark first not with Yarn.

Should I compile zeppelin using this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr -DskipTests

and set zeppelin configuration:

export MASTER=spark_maste node export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Thanks

On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello Elbamos

I already get the regular spark interpreter working before here is the maven complie command I used :

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java:445) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to zeppelin.

How can I set the spark_home to use the external spark? In zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at

org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862

— You are receiving this because you commented.

Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125

elbamos commented 8 years ago

Before you try that how about

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

I think you should be able to remove all the profiles except R and yarn but let's try this first.

On May 10, 2016, at 3:22 PM, zenonlpc notifications@github.com wrote:

Hello Elbamos

I misunderstood you, I will try make R working with spark first not with Yarn.

Should I compile zeppelin using this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr -DskipTests

and set zeppelin configuration:

export MASTER=spark_maste node export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Thanks

On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello Elbamos

I already get the regular spark interpreter working before here is the maven complie command I used :

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java:445) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to zeppelin.

How can I set the spark_home to use the external spark? In zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at

org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862

— You are receiving this because you commented.

Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

zenonlpc commented 8 years ago

Hello Elbamos

I tried your suggestion again with following command and configuration: mvn clean package -Pyarn -Pr -DskipTests

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Then when I run zeppelin for any command I got an error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

After that I was trying to remove Yarn from zeppelin and using the existing spark in EMR cluster The maven command I used is mvn clean package -Pr -DskipTests

configuration for zeppelin

export MASTER=spark://sparkmasternode:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

When I run any zeppelin command I got this error:

org.apache.spark.SparkException: Could not parse Spark Master URL: '' at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2735) at org.apache.spark.SparkContext.(SparkContext.scala:522) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

Thanks

On Tue, May 10, 2016 at 4:52 PM, elbamos notifications@github.com wrote:

Before you try that how about

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

I think you should be able to remove all the profiles except R and yarn but let's try this first.

On May 10, 2016, at 3:22 PM, zenonlpc notifications@github.com wrote:

Hello Elbamos

I misunderstood you, I will try make R working with spark first not with Yarn.

Should I compile zeppelin using this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr -DskipTests

and set zeppelin configuration:

export MASTER=spark_maste node export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Thanks

On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello Elbamos

I already get the regular spark interpreter working before here is the maven complie command I used :

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java:445) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152) at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at

org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)

at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)

at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)

at org.apache.spark.SparkContext.(SparkContext.scala:530) at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)

at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)

at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)

at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)

at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)

at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)

at

org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)

at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)

at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)

at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)

at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at

org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)

at

java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262) at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to zeppelin.

How can I set the spark_home to use the external spark? In zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos < notifications@github.com> wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc <notifications@github.com

wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at

org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub <

https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862

— You are receiving this because you commented.

Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218287804

akshayprakash commented 8 years ago

I think your spark master should be set to

(For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook

zenonlpc commented 8 years ago

Thanks Akshay

I will try that.

On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash notifications@github.com wrote:

I think your spark master should be set to (For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219469128

zenonlpc commented 8 years ago

Hello Akshay

I did a list instances command on my EMR cluster master node, here is the result:

{ "Instances": [ { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418705.629 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-07cb76b585791dc13", "PublicDnsName": "ec2-52-87-213-254.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-59-226.ec2.internal", "PublicIpAddress": "52.87.213.254", "Id": "ci-3P2QMFSOSKF2S", "PrivateIpAddress": "172.31.59.226" }, { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418719.445 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-0cd3184eb7788816a", "PublicDnsName": "ec2-52-90-79-148.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-58-205.ec2.internal", "PublicIpAddress": "52.90.79.148", "Id": "ci-13EARLMDOU64L", "PrivateIpAddress": "172.31.58.205" } ] }

I only have 1 master node and 1 slave node. Based on your previous reply, I should use the priveateDNS Name as the spark master host name is this correct?

For this cluster the spark master should be set to spark://ip-172-31-59-226.ec2.internal:7077

Thanks

On Mon, May 16, 2016 at 1:09 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Akshay

I will try that.

On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash <notifications@github.com

wrote:

I think your spark master should be set to (For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219469128

akshayprakash commented 8 years ago

Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq

If the above doesn't work try working it without the listening port.

zenonlpc commented 8 years ago

Hello guys

I tried Akshay's suggestion, but it still didn't work for me.

Compiled just R in zeppelin: mvn clean package -Pr -DskipTests

Configuration for zeppelin :

export MASTER=spark://ip-172-31-59-226.ec2.internal export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now when running R command in notebook, it gave this error:

org.apache.spark.SparkException: Invalid master URL: spark://ip-172-31-59-226.ec2.internal at org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2121) at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47) at org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) at org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)

Looks like all I need is the correct spark master URL, but I couldn't find it easily. So I googled found this link:

http://stackoverflow.com/questions/30760792/how-to-find-spark-master-url-on-amazon-emr

From this link, my understanding is that EMR spark cluster is created with YARN installed as default, so if I want to use external spark distribution installed by EMR, I am stuck with YARN.

Can anyone help me with this battle? I have been struggled with issue for almost two weeks.

Thanks in Advance

On Mon, May 16, 2016 at 1:27 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello Akshay

I did a list instances command on my EMR cluster master node, here is the result:

{ "Instances": [ { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418705.629 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-07cb76b585791dc13", "PublicDnsName": "ec2-52-87-213-254.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-59-226.ec2.internal", "PublicIpAddress": "52.87.213.254", "Id": "ci-3P2QMFSOSKF2S", "PrivateIpAddress": "172.31.59.226" }, { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418719.445 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-0cd3184eb7788816a", "PublicDnsName": "ec2-52-90-79-148.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-58-205.ec2.internal", "PublicIpAddress": "52.90.79.148", "Id": "ci-13EARLMDOU64L", "PrivateIpAddress": "172.31.58.205" } ] }

I only have 1 master node and 1 slave node. Based on your previous reply, I should use the priveateDNS Name as the spark master host name is this correct?

For this cluster the spark master should be set to spark://ip-172-31-59-226.ec2.internal:7077

Thanks

On Mon, May 16, 2016 at 1:09 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Thanks Akshay

I will try that.

On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash < notifications@github.com> wrote:

I think your spark master should be set to (For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219469128

zenonlpc commented 8 years ago

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash notifications@github.com wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq

If the above doesn't work try working it without the listening port.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210

zenonlpc commented 8 years ago

Hello guys

Thanks for helping me on this issue, really appreciated your time and effort.

Instead of piece of information, I want to give all the information so you might be bale to help resolve this quickly.

The EMR cluster I created is with release label emr-4.4.0 with Spark(1.6.0) , Ganaglia (3.7.2) as applications.

I used the following script to install zeppelin 0.6.0 as bootstrap action while the cluster started.


! /bin/bash -ex

if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true" ]; then

Install Git

sudo yum -y install git
# Install Maven
wget -P /tmp

http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz sudo mkdir /opt/apache-maven sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven

cat <<EOF >> /home/hadoop/.bashrc

export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3 export PATH=\$MAVEN_HOME/bin:\$PATH EOF source /home/hadoop/.bashrc

Install Zeppelin

git clone https://github.com/apache/incubator-zeppelin.git

/home/hadoop/zeppelin cd /home/hadoop/zeppelin

install some R packages before build zeppelin

sudo mkdir /tmp/rjars/ sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz

Building Zeppelin with R

  mvn clean package -Pr -DskipTests

# Configure Zeppelin

SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf echo ${SPARK_DEFAULTS} declare -a ZEPPELIN_JAVA_OPTS if [ -f $SPARK_DEFAULTS ]; then ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) fi echo ${SPARK_DEFAULTS} echo "${ZEPPELIN_JAVA_OPTS[@]}"

Getting cluster ID

CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"' '{print $4}') echo $CLUSTER_ID

Getting Spark host URL from aws emr list-instances command

SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID --instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print $4}') echo $SPARK_MASTER_URL

Putting values in zeppelin-env.sh

cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
cat <<EOF>> conf/zeppelin-env.sh

export MASTER=spark://${SPARK_MASTER_URL}:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python EOF

change zeppelin port to 7002

cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml

Start the Zeppelin daemon

bin/zeppelin-daemon.sh start

fi


Previously, when I built zeppelin with this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the ZEPPELIN_JAVA_OPTS is empty.

I tried the script running after the cluster is created, the variable ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap action, the variable ZEPPELIN_JAVA_OPTS is empty.

Does this mean the spark is not installed when I try to install zeppelin as bootstrap action during the cluster creation?

If that is the case, why the previous command is working fine?

This original script is coming from this post:

https://gist.github.com/andershammar/224e1077021d0ea376dd

Thanks Zenon

On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash notifications@github.com wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq

If the above doesn't work try working it without the listening port.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210

zenonlpc commented 8 years ago

Hello Everyone

We finally figured out the issue, instead of using %r we should %knitr to run R code in zeppelin

Thanks Zenon

On Tue, May 17, 2016 at 10:38 AM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello guys

Thanks for helping me on this issue, really appreciated your time and effort.

Instead of piece of information, I want to give all the information so you might be bale to help resolve this quickly.

The EMR cluster I created is with release label emr-4.4.0 with Spark(1.6.0) , Ganaglia (3.7.2) as applications.

I used the following script to install zeppelin 0.6.0 as bootstrap action while the cluster started.


! /bin/bash -ex

if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true" ]; then

Install Git

sudo yum -y install git
# Install Maven
wget -P /tmp

http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz sudo mkdir /opt/apache-maven sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven

cat <<EOF >> /home/hadoop/.bashrc

export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3 export PATH=\$MAVEN_HOME/bin:\$PATH EOF source /home/hadoop/.bashrc

Install Zeppelin

git clone https://github.com/apache/incubator-zeppelin.git

/home/hadoop/zeppelin cd /home/hadoop/zeppelin

install some R packages before build zeppelin

sudo mkdir /tmp/rjars/ sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz

Building Zeppelin with R

  mvn clean package -Pr -DskipTests

# Configure Zeppelin

SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf echo ${SPARK_DEFAULTS} declare -a ZEPPELIN_JAVA_OPTS if [ -f $SPARK_DEFAULTS ]; then ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) fi echo ${SPARK_DEFAULTS} echo "${ZEPPELIN_JAVA_OPTS[@]}"

Getting cluster ID

CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"' '{print $4}') echo $CLUSTER_ID

Getting Spark host URL from aws emr list-instances command

SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID --instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print $4}') echo $SPARK_MASTER_URL

Putting values in zeppelin-env.sh

cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
cat <<EOF>> conf/zeppelin-env.sh

export MASTER=spark://${SPARK_MASTER_URL}:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python EOF

change zeppelin port to 7002

cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml

Start the Zeppelin daemon

bin/zeppelin-daemon.sh start

fi


Previously, when I built zeppelin with this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the ZEPPELIN_JAVA_OPTS is empty.

I tried the script running after the cluster is created, the variable ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap action, the variable ZEPPELIN_JAVA_OPTS is empty.

Does this mean the spark is not installed when I try to install zeppelin as bootstrap action during the cluster creation?

If that is the case, why the previous command is working fine?

This original script is coming from this post:

https://gist.github.com/andershammar/224e1077021d0ea376dd

Thanks Zenon

On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash <notifications@github.com

wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq

If the above doesn't work try working it without the listening port.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210

elbamos commented 8 years ago

Both of those should work without a problem. If you are using the latest Zeppelin from master, though, there are a lot of recently introduced bugs that could cause this. You may be happier using the version from my repo.

On Jun 23, 2016, at 10:53 AM, zenonlpc notifications@github.com wrote:

Hello Everyone

We finally figured out the issue, instead of using %r we should %knitr to run R code in zeppelin

Thanks Zenon

On Tue, May 17, 2016 at 10:38 AM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello guys

Thanks for helping me on this issue, really appreciated your time and effort.

Instead of piece of information, I want to give all the information so you might be bale to help resolve this quickly.

The EMR cluster I created is with release label emr-4.4.0 with Spark(1.6.0) , Ganaglia (3.7.2) as applications.

I used the following script to install zeppelin 0.6.0 as bootstrap action while the cluster started.


! /bin/bash -ex

if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true" ]; then

Install Git

sudo yum -y install git

Install Maven

wget -P /tmp http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz sudo mkdir /opt/apache-maven sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven

cat <> /home/hadoop/.bashrc

export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3 export PATH=\$MAVEN_HOME/bin:\$PATH EOF source /home/hadoop/.bashrc

Install Zeppelin

git clone https://github.com/apache/incubator-zeppelin.git /home/hadoop/zeppelin cd /home/hadoop/zeppelin

install some R packages before build zeppelin

sudo mkdir /tmp/rjars/ sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz

Building Zeppelin with R

mvn clean package -Pr -DskipTests

Configure Zeppelin

SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf echo ${SPARK_DEFAULTS} declare -a ZEPPELIN_JAVA_OPTS if [ -f $SPARK_DEFAULTS ]; then ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) fi echo ${SPARK_DEFAULTS} echo "${ZEPPELIN_JAVA_OPTS[@]}"

Getting cluster ID

CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"' '{print $4}') echo $CLUSTER_ID

Getting Spark host URL from aws emr list-instances command

SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID --instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print $4}') echo $SPARK_MASTER_URL

Putting values in zeppelin-env.sh

cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh cat <> conf/zeppelin-env.sh export MASTER=spark://${SPARK_MASTER_URL}:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python EOF

change zeppelin port to 7002

cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml

Start the Zeppelin daemon

bin/zeppelin-daemon.sh start fi


Previously, when I built zeppelin with this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the ZEPPELIN_JAVA_OPTS is empty.

I tried the script running after the cluster is created, the variable ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap action, the variable ZEPPELIN_JAVA_OPTS is empty.

Does this mean the spark is not installed when I try to install zeppelin as bootstrap action during the cluster creation?

If that is the case, why the previous command is working fine?

This original script is coming from this post:

https://gist.github.com/andershammar/224e1077021d0ea376dd

Thanks Zenon

On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash <notifications@github.com

wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq

If the above doesn't work try working it without the listening port.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

pramitchoudhary commented 7 years ago

Hey guys, Bumped into a similar error while running using the zeppelin demon provided by the EMR instance. I followed the steps as mentioned here and was successful in launch in the sparkR shell but getting 'r' interpreter not found error. The version of zeppelin running on EMR is 0.6.1. I tried following the conversation on the mailing list and from my understanding, the r interpreter should be part of the build right ?