amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

File paths for 'load-the-wikipedia-article' does not exist #198

Closed karenyyng closed 8 years ago

karenyyng commented 8 years ago

Here is the link In the code example:

val articlesRDD = sc.textFile("data/indexedrdd/wiki-article-titles.txt").map {
  line =>
    val fields = line.split('\t')
    (fields(0).toLong, fields(1))
}

The file path "data/indexedrdd/wiki-article-titles.txt" does not exist in ampcamp6-rc1.zip

karenyyng commented 8 years ago

Ok Jey issued a patch with the following file indexedrdd-data.zip at http://www.cs.berkeley.edu/~jey/ampcamp6/ If Spark is launched from PATH_TO/ampcamp6/spark/ as instructed in the tutorial, the unzipped folder should be put at PATH_TO/ampcamp6/spark/ for the IndexedRDD example to work. i.e. PATH_TO_ampcamp6/spark/data/indexedrdd/ <- unzipped content