NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

Make BridgeStoreIterator lazy. #286

Closed blever closed 11 years ago

blever commented 11 years ago

Fix BridgeStoreIterator to construct SequenceFileReader objects lazily using a Stream. Fixes #285.

BridgeStoreIterator constructs multiple SequenceFileReader objects under the hood for each part file underlying a materialised DList. These objects are constructed eagerly and consume file descriptors (socket connections to HDFS). If the number of part files exhausts the available file descriptors for the client-side Scoobi process, bad things ensure.

Add a Spec that without this fix shoud fail in cluster mode (but will still pass in in-memory and local mode).

blever commented 11 years ago

@etorreborre - I haven't been able to replicate the error directly in specs as it needs to run in cluster mode. I wrote a small ScoobiApp that contains the contents of the spec in this PR and I got it to fail on the cluster here but too tricky to do with the spec in the Scoobi code base.

So, when you merge, can you first run the spec on the cluster without the BridgeStoreIterator fix and check that it fails. Then apply the fix and check that it passes.

If it all looks good, would be good to get a SNAPSHOT published with this included, and it would be nice to have this included in 0.7.2 also.

Thanks!

blever commented 11 years ago

Now merged.