fullcontact / hadoop-sstable

Splittable Input Format for Reading Cassandra SSTables Directly
Apache License 2.0
49 stars 14 forks source link

Problems to run the SimpleExample MR job #6

Closed java8964 closed 9 years ago

java8964 commented 10 years ago

I tried to run the SimpleExample as MR job, locally.

Here is what I passed in as parameters:

-fs local -jt local path_to_hadoop-sstable/sstable-core/src/test/resources/data output_path

First, here is the error I got: Exception in thread "main" java.io.FileNotFoundException: File file:/hadoop-sstable/sstable-core/src/test/resources/data/Keyspace1-Standard1-ic-0-Index.db.Index does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:125)

Not sure why the code is looking for the index file like "*-Index.db.Index". I know the index file in Cassandra is not named like that. So I change the SSTABLE_INDEX_SUFFIX to empty string, instead of ".Index", after reading the code.

Now I got the new error: Exception in thread "main" java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at com.fullcontact.sstable.index.SSTableIndexIndex.readIndex(SSTableIndexIndex.java:63) at com.fullcontact.sstable.hadoop.mapreduce.SSTableInputFormat.listStatus(SSTableInputFormat.java:85) at com.fullcontact.sstable.hadoop.mapreduce.SSTableInputFormat.getSplits(SSTableInputFormat.java:139)

The code failed to readLong from the Index File InputStream to the end.

Am I totally trying to do the wrong thing here? Or What is the correct way to test SimpleExample running as a Local MR job, using the test data coming with it?

Thanks

Yong

clohfink-blackbirdit commented 10 years ago

need to run the index creation thingy first:

https://github.com/fullcontact/hadoop-sstable/wiki/Getting-Started#index-the-sstable-files-on-your-hadoop-cluster

bvanberg commented 10 years ago

That's exactly right! Thanks @clohfink-blackbirdit

@java8964 let me know if you have further questions.

Thanks,

Ben.