NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

Nodes are getting recomputed #213

Closed espringe closed 11 years ago

espringe commented 11 years ago

If a node is allowed to be recomputed, or would make scoobi unusable for anything non-pure -- as well as being extremely unpredictable. So I think there needs to be guarantee that a node is only computed a single time.

Being pragmatic, it's not entirely unreasonable to want to have side effects (randomness, using services, calling processes). And maybe one of the computation is ridiculously expensive, so it shouldn't frivolously be recomputed.

Here's a minimal example that illustrates the point:

    val numbers = (1 to 100).toDList.map { x => scala.util.Random.nextInt()  }
    val numbers2 = numbers.map(x => x)

    println( run(numbers.sum, numbers2.sum) )

sample output: (-857987464,428986714)

etorreborre commented 11 years ago

I think this is a problem with the optimiser but I don't know how to write the fix right now. I'll try to fix this tomorrow.