amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

SparkStreaming in ampcamp4 is not working. #150

Open giwa opened 10 years ago

giwa commented 10 years ago

I followed instruction in ampcamp4 and launched amazon ec2 instances. I tried to run Spark Streaming but it was not working. I suffered from the problem that scala version mismatch to requirement in sbt. Even if I fixed scala version problem. example of Spark Streaming did not work in ec2 instances.

line 3 in build.sbt

scalaVersion := "2.10"
should be 
scalaVersion := "2.10.3"

I pasted two errors. First one is the error due to scala version when I complied Spark Streaming. Second one is happened when I ran Spark Streaming.

[root@ip-172-31-43-168 scala]# sbt/sbt package
[info] Set current project to Tutorial (in build file:/root/training/streaming/scala/)
[info] Updating {file:/root/training/streaming/scala/}scala...
[info] Resolving org.scala-lang#scala-library;2.10 ...
[warn]     module not found: org.scala-lang#scala-library;2.10
[warn] ==== local: tried
[warn]   /root/.ivy2/local/org.scala-lang/scala-library/2.10/ivys/ivy.xml
[warn] ==== public: tried
[warn]   http://repo1.maven.org/maven2/org/scala-lang/scala-library/2.10/scala-library-2.10.pom
[info] Resolving org.scala-lang#scala-compiler;2.10 ...
[warn]     module not found: org.scala-lang#scala-compiler;2.10
[warn] ==== local: tried
[warn]   /root/.ivy2/local/org.scala-lang/scala-compiler/2.10/ivys/ivy.xml
[warn] ==== public: tried
[warn]   http://repo1.maven.org/maven2/org/scala-lang/scala-compiler/2.10/scala-compiler-2.10.pom
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]     ::          UNRESOLVED DEPENDENCIES         ::
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]     :: org.scala-lang#scala-library;2.10: not found
[warn]     :: org.scala-lang#scala-compiler;2.10: not found
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.scala-lang#scala-library;2.10: not found
unresolved dependency: org.scala-lang#scala-compiler;2.10: not found
    at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:213)
    at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:122)
    at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:121)
    at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116)
    at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116)
    at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:104)
    at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:51)
    at sbt.IvySbt$$anon$3.call(Ivy.scala:60)
    at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:98)
    at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:81)
    at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:102)
    at xsbt.boot.Using$.withResource(Using.scala:11)
    at xsbt.boot.Using$.apply(Using.scala:10)
    at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:62)
    at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:52)
    at xsbt.boot.Locks$.apply0(Locks.scala:31)
    at xsbt.boot.Locks$.apply(Locks.scala:28)
    at sbt.IvySbt.withDefaultLogger(Ivy.scala:60)
    at sbt.IvySbt.withIvy(Ivy.scala:101)
    at sbt.IvySbt.withIvy(Ivy.scala:97)
    at sbt.IvySbt$Module.withModule(Ivy.scala:116)
    at sbt.IvyActions$.update(IvyActions.scala:121)
    at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1161)
    at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1159)
    at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1182)
    at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1180)
    at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35)
    at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1184)
    at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1179)
    at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45)
    at sbt.Classpaths$.cachedUpdate(Defaults.scala:1187)
    at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1152)
    at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1130)
    at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
    at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:42)
    at sbt.std.Transform$$anon$4.work(System.scala:64)
    at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237)
    at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237)
    at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
    at sbt.Execute.work(Execute.scala:244)
    at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237)
    at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237)
    at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160)
    at sbt.CompletionService$$anon$2.call(CompletionService.scala:30)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
[error] (*:update) sbt.ResolveException: unresolved dependency: org.scala-lang#scala-library;2.10: not found
[error] unresolved dependency: org.scala-lang#scala-compiler;2.10: not found
[error] Total time: 8 s, completed May 5, 2014 7:43:59 AM

This was happened when I ran Spark Streaming example

14/05/05 08:08:15 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/05/05 08:08:16 INFO Remoting: Starting remoting
14/05/05 08:08:16 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@ip-172-31-43-168.ec2.internal:54899]
14/05/05 08:08:16 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@ip-172-31-43-168.ec2.internal:54899]
14/05/05 08:08:16 INFO server.Server: jetty-7.6.8.v20121106
14/05/05 08:08:16 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50072
14/05/05 08:08:16 INFO server.Server: jetty-7.6.8.v20121106
14/05/05 08:08:16 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46682
14/05/05 08:08:17 INFO server.Server: jetty-7.6.8.v20121106
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
14/05/05 08:08:17 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
-------------------------------------------
Time: 1399277300000 ms
-------------------------------------------

-------------------------------------------
Time: 1399277301000 ms
-------------------------------------------

-------------------------------------------
Time: 1399277302000 ms
-------------------------------------------

14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 70 (task 2.0:0)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
    at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 71 (task 2.0:0)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 72 (task 2.0:0)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 73 (task 2.0:0)
14/05/05 08:08:22 ERROR scheduler.TaskSetManager: Task 2.0:0 failed 4 times; aborting job
[error] (Thread-39) org.apache.spark.SparkException: Job aborted: Task 2.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver)
org.apache.spark.SparkException: Job aborted: Task 2.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[trace] Stack trace suppressed: run last compile:run for the full output.
jstjohn commented 10 years ago

I have the same java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver issue now that I fixed the scalaVersion := "2.10.3" thing.

tcfuji commented 10 years ago

Same here. Just getting blank tweets.

dev24aday commented 10 years ago

Same here. Not sure what is wrong. If I changed the sparkUrl to "local[4]", it works perfectly.

mliesenberg commented 10 years ago

I've got the same issue.

experceptus commented 9 years ago

I got a bit farther along. I edited the build.sbt: I changed:

scalaVersion := "2.10.4"

and: "org.apache.spark" %% "spark-streaming" % "0.9.0-incubating", "org.apache.spark" %% "spark-streaming-twitter" % "0.9.0-incubating" to: "org.apache.spark" %% "spark-streaming" % "1.1.0", "org.apache.spark" %% "spark-streaming-twitter" % "1.1.0"

I get a few warnings but not classpath errors. The app runs but it's still just getting blank tweets.

experceptus commented 9 years ago

Sorry, that should have been: scalaVersion := "2.10.3" That's the version reported when I run ./spark-shell If I let the app run long enough, 15 seconds or so, I'll get the line: WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Googling that brings up all kinds of stuff, some of which say it's a red herring. All of my slaves have 2gigs, which is the amp-camp default.