When DList#materialize is used, SequenceFileReader objects are created eagerly for every part file in the underlying bridge store. This becomes a problem if the number of part files exceeds the file descriptor limit set on the client machine. In such a case, accessing the resulting DObject can result in a HDFS MissingBlockException (even though the block is present).
To avoid this problem, it should be possible to refactor BridgeStoreIterator to create SequenceFileReader objects lazily, ensuring that once each part file is iterated over, its associated SequenceFileReader, and underlying file descriptor, is released.
When
DList#materialize
is used,SequenceFileReader
objects are created eagerly for every part file in the underlying bridge store. This becomes a problem if the number of part files exceeds the file descriptor limit set on the client machine. In such a case, accessing the resultingDObject
can result in a HDFSMissingBlockException
(even though the block is present).To avoid this problem, it should be possible to refactor
BridgeStoreIterator
to createSequenceFileReader
objects lazily, ensuring that once each part file is iterated over, its associatedSequenceFileReader
, and underlying file descriptor, is released.