fullcontact / hadoop-sstable

Splittable Input Format for Reading Cassandra SSTables Directly
Apache License 2.0
49 stars 14 forks source link

Q: Cass columns in multiple sstables #25

Open LannyRipple opened 9 years ago

LannyRipple commented 9 years ago

So a question I have is how does hadoop-sstable deal with Cass spreading columns over multiple SSTables. When you query Cass it does the work of finding the ranges you are querying, streaming the SSTables into memtables to give you the "latest" data or deal with tombstones, and then provides the result. Are you doing a full compaction to avoid needing to look in multiple tables? (It didn't sound like it unless Priam does so during backup of your ring.)

Cheers

bvanberg commented 9 years ago

When implementing a solution using hadoop-sstable you'll require a mapper and reducer implementation. The stitching together of the columns comes together in the reducer where you'll have everything you need to resolve the latest data for a particular key.