fullcontact / hadoop-sstable

Splittable Input Format for Reading Cassandra SSTables Directly
Apache License 2.0
49 stars 14 forks source link

Handling deletion inside collection types in Cassandra 2 #30

Open abstract-karshit opened 8 years ago

abstract-karshit commented 8 years ago

Hi, first of all thanks for this great implementation. One question while using Hadoop SSTable on Cassandra 2 using collection types. Lets say one of the column is a map type with data {1: 'yes', 2: 'no', 3: 'true'} Now if {2:'no'} is deleted from map, incremental sstables give me output as : {"", "d"} as if the entire column is deleted which is not the case here. How to handle it? Or how have you guys handled this case?

bvanberg commented 8 years ago

Hey Karshit,

To help answer your question, we do handle deletes for our sstables, but not with the map type. The code may be slightly different. We usually mark columns as deleted as we process them, and then do the work to handle that in the reducer where we have all the available data to make decisions about deleting, etc. LMK if you'd like more details on what we're doing and I can probably conjure up some sample code.

abstract-karshit commented 8 years ago

sure, let me generate a sample dataset for you. Will get back on this over the weekend.