NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

DList ops being executed more than once #281

Closed blever closed 11 years ago

blever commented 11 years ago

DList ops should only be executed once. This is not only important for performance but also necessary if the operation is side-effecting, e.g. making a calling to Random.

An example of when this doesn't seem to be the case is exercised in the following code:

package com.nicta.scoobi
package acceptance

import Scoobi._
import testing.mutable.NictaSimpleJobs

class SideEffectsAreBadSpec extends NictaSimpleJobs {

  "Filtering with side effectss" >> {
    "Filtering random" >> { implicit c: SC =>
      val xs = (1 to 5000).toDList
      val zs = (xs filter (x => { if (x == 100) println("running"); scala.util.Random.nextDouble() > 0.5})) 
      val as = (zs.size join zs.map(x => "" + x)).materialise
      val ys = as.run
      println(" total: " + ys.size + " vs " + ys.head._1 )
      true
    }
  }
}

// running
// running
// total: 2538 vs 2479
markhibberd commented 11 years ago

Concurrent bug reporting. See #282.

blever commented 11 years ago

Duplicate of #282.