Open timkulich opened 7 years ago
Hi @timkulich yes you are correct, the point of the exercise is to use Spark. In a real-world scenario, reading in this way (scala.io.Source.fromFile("data/dna.txt")
) would cause an out of memory error. You need to use the sc.textFile
primitive from the SparkContext to read chunks of the file in parallel. Loading the file in the driver program and parallelize it later wouldn't work either.
Hi,
I have the following solutions to Task 1 and Task 2 respectively:
`{ // Loads dna.txt into string and counts the letters val source = scala.io.Source.fromFile("data/dna.txt") val lines = try source.mkString finally source.close()
}`
and
`{ // for-loop of 1000 points that calculates the area of width = 1/1000 and height of f(x) for each iteration. import math.sin import math.cos val points:Double = 1000 var result:Double = 0 var i=1; for( i <- 1 to points){ result += (1/points)*(1 + sin(i.toDouble/points))/cos(i.toDouble/points) println(result) }
}`
They both seem to produce the right result. However, I'm assuming that I'm not following the condition: "Warning: all of the tasks must be solved using the Spark RDD API, in order to distribute the computations in the Spark workers."
Am I correct? Do I have to use the parallelize and reduce stuff to distribute it to the workers?