NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

DList#materialise creates file handles eagerly, not lazily #285

Closed blever closed 11 years ago

blever commented 11 years ago

When DList#materialize is used, SequenceFileReader objects are created eagerly for every part file in the underlying bridge store. This becomes a problem if the number of part files exceeds the file descriptor limit set on the client machine. In such a case, accessing the resulting DObject can result in a HDFS MissingBlockException (even though the block is present).

To avoid this problem, it should be possible to refactor BridgeStoreIterator to create SequenceFileReader objects lazily, ensuring that once each part file is iterated over, its associated SequenceFileReader, and underlying file descriptor, is released.