deanwampler / spark-scala-tutorial

A free tutorial for Apache Spark.
Other
985 stars 430 forks source link

Tests failed for WordCount3/WordCount2/InvertedIndex5b #16

Closed liweiz closed 9 years ago

liweiz commented 9 years ago

I'm following the instructions till (building-and-testing)[https://github.com/deanwampler/spark-workshop#building-and-testing]. And I got failed tests messages. I tried googling but with no luck. I guess it's perhaps just something wrong with my installation since there is no one has the same issue. But still I want to ask here to make sure. I'm using the local setup with the browser-based IDE and added the ~/.zshrc on my Mac with ACTIVATOR_HOME = activator installation directory.

Thanks in advance,

Liwei

The followings are the error messages:

WordCount3 computes the word count of the input corpus with options Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.util.regex.PatternSyntaxException: Unknown character property name {Alphabetic} near index 17 [^\p{IsAlphabetic}]+ ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.charPropertyNodeFor(Pattern.java:2437) at java.util.regex.Pattern.family(Pattern.java:2412) at java.util.regex.Pattern.range(Pattern.java:2335) at java.util.regex.Pattern.clazz(Pattern.java:2268) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823) at java.lang.String.split(String.java:2292) at java.lang.String.split(String.java:2334) at WordCount3$$anonfun$2.apply(WordCount3.scala:60) at WordCount3$$anonfun$2.apply(WordCount3.scala:60) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Driver stacktrace:

WordCount2 computes the word count of the input corpus Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.util.regex.PatternSyntaxException: Unknown character property name {Alphabetic} near index 17 [^\p{IsAlphabetic}]+ ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.charPropertyNodeFor(Pattern.java:2437) at java.util.regex.Pattern.family(Pattern.java:2412) at java.util.regex.Pattern.range(Pattern.java:2335) at java.util.regex.Pattern.clazz(Pattern.java:2268) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823) at java.lang.String.split(String.java:2292) at java.lang.String.split(String.java:2334) at WordCount2$$anonfun$3.apply(WordCount2.scala:57) at WordCount2$$anonfun$3.apply(WordCount2.scala:57) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Driver stacktrace:

InvertedIndex5b computes the famous 'inverted index' from web crawl data Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.util.regex.PatternSyntaxException: Unknown character property name {Alphabetic} near index 17 [^\p{IsAlphabetic}]+ ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.charPropertyNodeFor(Pattern.java:2437) at java.util.regex.Pattern.family(Pattern.java:2412) at java.util.regex.Pattern.range(Pattern.java:2335) at java.util.regex.Pattern.clazz(Pattern.java:2268) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823) at java.lang.String.split(String.java:2292) at java.lang.String.split(String.java:2334) at InvertedIndex5b$$anonfun$main$2.apply(InvertedIndex5b.scala:56) at InvertedIndex5b$$anonfun$main$2.apply(InvertedIndex5b.scala:50) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Driver stacktrace:

deanwampler commented 9 years ago

It's complaining about the regex on line 60 in WordCount3. What version of Java are you using? However, I believe all recent versions support the same Regex syntax.

deanwampler commented 9 years ago

Sorry for the delay in replying...

liweiz commented 9 years ago

@deanwampler Thanks for your reply. The version of Java I'm using is downloaded from Apple's website since the one from Java's site was not working. I guess it's an Apple issue. I'll try to reinstall Java from other sources later on, if possible.