SNICScienceCloud / LDSA-Spark

A collections of Apache Spark notebooks for the LDSA course
Apache License 2.0
0 stars 3 forks source link

Question about task 1 and 2 #7

Open siripersson opened 7 years ago

siripersson commented 7 years ago

My code for task 1 is the following:

val textFile = sc.textFile("data/dna.txt")
    textFile.count(c => c == 'g')
    val linesWithCG = textFile.filter(line => line.contains("cg")).count()
    linesWithCG

Which gives the result: 164 Bu when I check the text file "dna.txt" the occurrence of "cg" seems to be much more.

My code for task 2:

val n = 1000
val count = sc.parallelize(1 to n)
  .map { _ =>
      val x = math.random
      val y = math.random
      if(x*x + y*y < 4) 1 else 0
  }.reduce(_+_)
val result =(1-0)*4.0*(count / n)
println(result)

Which gives the result: 4.0 But it seems to big to be correct.

mcapuccini commented 7 years ago

Hi!

In the first task you are counting the lines that contain GC, but you are supposed to count the cumulative number of GC in each line. This is why you get so little number.

In the second task you are throwing darts in the circle x*x + y*y, and not in the function that is given in the assignment.

I hope this will help. Good luck!