Closed ghost closed 7 years ago
The tutorial uses Scala 2.11. Your hadoop version of Spark is probably using Scala 2.10. You'll have to recompile for 2.10.
This is easy to do one of two ways:
sbt
prompt, change to 2.10.6 temporarily, then build the code:
> ++ 2.10.6
> package
> ...
project/Build.scala
and edit line 9:
val ScalaVersion = "2.10.6"
I followed what you have explained, it still shows an error [error] ls:
/user/root/output': No such file or directory` at the bottom. Thanks :)
[info] Loading project definition from /root/Shiva/spark-scala-tutorial/project
[info] Set current project to spark-scala-tutorial (in build file:/root/Shiva/spark-scala-tutorial/)
> ++ 2.10.6
[info] Setting version to 2.10.6
[info] Reapplying settings...
[info] Set current project to spark-scala-tutorial (in build file:/root/Shiva/spark-scala-tutorial/)
> package
[info] Updating {file:/root/Shiva/spark-scala-tutorial/}SparkWorkshop...
[info] Resolving com.sun.jersey.jersey-test-framework#jersey-test-framework-griz[info] Resolving com.fasterxml.jackson.module#jackson-module-scala_2.10;2.4.4 ..[info] Resolving com.fasterxml.jackson.module#jackson-module-scala_2.10;2.4.4 ..[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.6.2/spark-core_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-core_2.10;1.6.2!spark-core_2.10.jar (615ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.6.2/spark-streaming_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-streaming_2.10;1.6.2!spark-streaming_2.10.jar (111ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.10/1.6.2/spark-sql_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-sql_2.10;1.6.2!spark-sql_2.10.jar (173ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.10/1.6.2/spark-hive_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-hive_2.10;1.6.2!spark-hive_2.10.jar (64ms)
[info] downloading https://repo1.maven.org/maven2/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar ...
[info] [SUCCESSFUL ] com.twitter#chill_2.10;0.5.0!chill_2.10.jar (35ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-launcher_2.10/1.6.2/spark-launcher_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-launcher_2.10;1.6.2!spark-launcher_2.10.jar (28ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-network-common_2.10/1.6.2/spark-network-common_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-network-common_2.10;1.6.2!spark-network-common_2.10.jar (169ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-network-shuffle_2.10/1.6.2/spark-network-shuffle_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-network-shuffle_2.10;1.6.2!spark-network-shuffle_2.10.jar (25ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-unsafe_2.10/1.6.2/spark-unsafe_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-unsafe_2.10;1.6.2!spark-unsafe_2.10.jar (26ms)
[info] downloading https://repo1.maven.org/maven2/com/typesafe/akka/akka-remote_2.10/2.3.11/akka-remote_2.10-2.3.11.jar ...
[info] [SUCCESSFUL ] com.typesafe.akka#akka-remote_2.10;2.3.11!akka-remote_2.10.jar (93ms)
[info] downloading https://repo1.maven.org/maven2/com/typesafe/akka/akka-slf4j_2.10/2.3.11/akka-slf4j_2.10-2.3.11.jar ...
[info] [SUCCESSFUL ] com.typesafe.akka#akka-slf4j_2.10;2.3.11!akka-slf4j_2.10.jar (27ms)
[info] downloading https://repo1.maven.org/maven2/org/json4s/json4s-jackson_2.10/3.2.10/json4s-jackson_2.10-3.2.10.jar ...
[info] [SUCCESSFUL ] org.json4s#json4s-jackson_2.10;3.2.10!json4s-jackson_2.10.jar (25ms)
[info] downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/module/jackson-module-scala_2.10/2.4.4/jackson-module-scala_2.10-2.4.4.jar ...
[info] [SUCCESSFUL ] com.fasterxml.jackson.module#jackson-module-scala_2.10;2.4.4!jackson-module-scala_2.10.jar(bundle) (53ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/avro/avro/1.7.7/avro-1.7.7.jar ...
[info] [SUCCESSFUL ] org.apache.avro#avro;1.7.7!avro.jar (70ms)
[info] downloading https://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_2.10/2.3.11/akka-actor_2.10-2.3.11.jar ...
[info] [SUCCESSFUL ] com.typesafe.akka#akka-actor_2.10;2.3.11!akka-actor_2.10.jar (112ms)
[info] downloading https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.0/scalap-2.10.0.jar ...
[info] [SUCCESSFUL ] org.scala-lang#scalap;2.10.0!scalap.jar (57ms)
[info] downloading https://repo1.maven.org/maven2/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar ...
[info] [SUCCESSFUL ] org.scala-lang#scala-compiler;2.10.0!scala-compiler.jar (735ms)
[info] downloading https://repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.10/1.6.2/spark-catalyst_2.10-1.6.2.jar ...
[info] [SUCCESSFUL ] org.apache.spark#spark-catalyst_2.10;1.6.2!spark-catalyst_2.10.jar (284ms)
[info] downloading https://repo1.maven.org/maven2/org/scalatest/scalatest_2.10/2.2.4/scalatest_2.10-2.2.4.jar ...
[info] [SUCCESSFUL ] org.scalatest#scalatest_2.10;2.2.4!scalatest_2.10.jar(bundle) (365ms)
[info] downloading https://repo1.maven.org/maven2/org/scalacheck/scalacheck_2.10/1.12.2/scalacheck_2.10-1.12.2.jar ...
[info] [SUCCESSFUL ] org.scalacheck#scalacheck_2.10;1.12.2!scalacheck_2.10.jar (55ms)
[info] Done updating.
[info] Compiling 38 Scala sources to /root/Shiva/spark-scala-tutorial/target/scala-2.10/classes...
[info] 'compiler-interface' not yet compiled for Scala 2.10.6. Compiling...
[info] Compilation completed in 8.816 s
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
[info] Packaging /root/Shiva/spark-scala-tutorial/target/scala-2.10/spark-scala-tutorial_2.10-5.0.0.jar ...
[info] Done packaging.
[success] Total time: 27 s, completed Dec 31, 2016 12:33:35 AM
> run
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
Multiple main classes detected, select one to run:
[1] Crawl5a
[2] Crawl5aLocal
[3] InvertedIndex5b
[4] InvertedIndex5bSortByWordAndCounts
[5] Joins7
[6] Joins7Ordered
[7] Matrix4
[8] Matrix4StdDev
[9] NGrams6
[10] SparkSQL8
[11] SparkStreaming11
[12] SparkStreaming11Main
[13] SparkStreaming11MainSocket
[14] SparkStreaming11SQL
[15] WordCount2
[16] WordCount2GroupBy
[17] WordCount2SortByCount
[18] WordCount2SortByWord
[19] WordCount3
[20] WordCount3SortByWordLength
[21] hadoop.HCrawl5a
[22] hadoop.HInvertedIndex5b
[23] hadoop.HJoins7
[24] hadoop.HMatrix4
[25] hadoop.HNGrams6
[26] hadoop.HSparkSQL8
[27] hadoop.HSparkStreaming11
[28] hadoop.HWordCount3
[29] sparktutorial.solns.InvertedIndex5bTfIdf
[30] util.streaming.DataDirectoryServer
[31] util.streaming.DataSocketServer
Enter number: 28
[info] Running hadoop.HWordCount3
[info] running: spark-submit --class WordCount3 ./target/scala-2.10/spark-scala-tutorial_2.10-5.0.0.jar ./target/scala-2.11/spark-scala-tutorial_2.11-5.0.0.jar --out /user/root/output/kjv-wc3
[info]
[info] Unrecognized argument (or missing second argument): ./target/scala-2.11/spark-scala-tutorial_2.11-5.0.0.jar
[info]
[info] usage: java ... WordCount3$ [options]
[info] where the options are the following:
[info] -h | --help Show this message and quit.
[info] -i | --in | --inpath path The input root directory of files to crawl (default: data/kjvdat.txt)
[info] -o | --out | --outpath path The output location (default: output/kjv-wc3)
[info]
[info] -m | --master M The "master" argument passed to SparkContext, "M" is one of:
[info] "local", local[N]", "mesos://host:port", or "spark://host:port"
[info] (default: local).
[info] -q | --quiet Suppress some informational output.
[info]
[info] Contents of the output directories:
[error] ls: `/user/root/output': No such file or directory
[info]
[info] **** To see the contents, open the following URL(s):
[info]
[info]
[success] Total time: 11 s, completed Dec 31, 2016 12:33:48 AM
Two comments. First, when you run with Hadoop, you'll have to define the correct directories in HDFS, which is the default file system assumed in that context, not a local file system (which doesn't mean much in a cluster - what's local?) You'll also want to use your correct user name for the cluster. /user/root
is the home of the HDFS root
user and probably not what you want. If you actually have permissions for that directory, you could create the output
subdirectory and it might work. Normally, you would use /user/myname/output
.
Second, those hadoop.H*
hooks I created aren't well tested. I should remove them as I'm no longer interested in maintaining them. I would try running the spark-submit
shell script itself, instead. However, I don't think it is a problem in this case.
Hi Dean :)
I can see a success message with no errors but I cannot see contents in the output directory. As you have mentioned above, I have create an output folder using hdfs dfs -mkdir /user/root/output
root is the user on my cluster.
It ran with no errors but It didn't generate any files?
[info] Running hadoop.HWordCount3
[info] running: spark-submit --class WordCount3 ./target/scala-2.10/spark-scala-tutorial_2.10-5.0.0.jar ./target/scala-2.11/spark-scala-tutorial_2.11-5.0.0.jar --out /user/root/output/kjv-wc3
[info]
[info] Unrecognized argument (or missing second argument): ./target/scala-2.11/spark-scala-tutorial_2.11-5.0.0.jar
[info]
[info] usage: java ... WordCount3$ [options]
[info] where the options are the following:
[info] -h | --help Show this message and quit.
[info] -i | --in | --inpath path The input root directory of files to crawl (default: data/kjvdat.txt)
[info] -o | --out | --outpath path The output location (default: output/kjv-wc3)
[info]
[info] -m | --master M The "master" argument passed to SparkContext, "M" is one of:
[info] "local", local[N]", "mesos://host:port", or "spark://host:port"
[info] (default: local).
[info] -q | --quiet Suppress some informational output.
[info]
[info] Contents of the output directories:
[info]
[info] **** To see the contents, open the following URL(s):
[info]
[info]
[success] Total time: 13 s, completed Jan 2, 2017 9:02:40 PM
I'm going to delete the Hadoop support. Sorry.
Hi Owner!!
My WordCount3 is running successfully locally. I can the output folder with the output files in it. However when I run them on hadoop cluster using
hadoop.HWordCount3
it displays an error.Do I have to create a directory in my hadoop cluster? as
/user/root/output
? When I look into the scala code of hadoop for WordCount3, I feel like it's incomplete, but I am not sure. Please suggest!!