fullcontact / hadoop-sstable

Splittable Input Format for Reading Cassandra SSTables Directly
Apache License 2.0
49 stars 14 forks source link

EOF if all columns not iterated #5

Open clohfink-blackbirdit opened 10 years ago

clohfink-blackbirdit commented 10 years ago

If you don't walk through all the columns you get an exception:

java.io.EOFException
at com.fullcontact.cassandra.io.util.RandomAccessReader.readFully(RandomAccessReader.java:259)
at com.fullcontact.cassandra.io.util.RandomAccessReader.readFully(RandomAccessReader.java:250)
at com.fullcontact.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:481)
at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
at com.fullcontact.sstable.hadoop.mapreduce.SSTableRowRecordReader.nextKeyValue(SSTableRowRecordReader.java:43)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Easy to workaround with:

        while (value.hasNext()) {
            OnDiskAtom atom = value.next();
        }

Kinda a corner case to handle so pretty low priority but could possibly identify if it hasn't reached the end of the row and jump to end.