NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

Make DList.partition work with nonepure funcs #264

Closed espringe closed 11 years ago

espringe commented 11 years ago

DList.partition should only call the function once per element in the DList. This is very important for predicting performance (e.g. the function is super duper expensive) and even more importantly, when using a non-pure function. For instance, if you used partition to split your data into two parts based on a random number generator -- with the current code, you would have duplicates (and missing elements) in the final results. This is extremely unexpected.

This unit test fails (for which, I'll create a bug), but the pull request introduces no regressions per se so I think its safe to pull.