Since we care about the payload of old blocks (and thus can't rely on merckle trees to reduce chain size), we need to implement some form of garbage collection. One approach to this would be:
Pick a block in the chain as the "checkpoint". You'd want this to be a block sufficiently committed ie 6 deep
Make a snapshot of the current state of the KV store
Starting from block 0 until the checkpoint block, go through each operation of each block and compare the operation's info to the state of the KV store to see if it is still relevant. Some examples:
PUT: Check to see if its key still exists; if it doesn't, we know it must have eventually been deleted so we can throw this operation away, but if it does keep it
UPDATE: Check to see if its key still exists; if it doesn't, we know it must have eventually been deleted so we can throw this operation away. If it does, compare the values; if they're different, a later update must have occurred to change the value of this one so throw this operation away, but if they're the same then keep it
DELETE: Throw away deletes outright, since their PUTS and UPDATES will have also been thrown away
Pack as many operations as possible according to the config parameters into each block, then rehash the block and move onto the next one
Once done, broadcast the new compressed chain (as well as which block was its checkpoint block) to everyone else to update their chains
There's definitely some subtleties to consider here like the long-term uniqueness of the keys used (the approach above assumes keys will always be unique). This approach also has the implication that, while before we could see that a data entry was created, maybe updated, then deleted even though it isn't reflected from just the KV store at present time, after this we will no longer know that that data ever existed and it will be truly unrecoverable.
One nice thing is that while this node goes into GC mode to do all this, the other nodes can continue doing work uninterrupted, then once they get the <new_chain, checkpoint> packet, simply cut off the old chain before the checkpoint, point the prevHash of the checkpoint block at the end of the new chain, and continue about things. The GC node then just needs to hear from someone about the work that was done while it was GC'ing and catch up.
Since we care about the payload of old blocks (and thus can't rely on merckle trees to reduce chain size), we need to implement some form of garbage collection. One approach to this would be:
There's definitely some subtleties to consider here like the long-term uniqueness of the keys used (the approach above assumes keys will always be unique). This approach also has the implication that, while before we could see that a data entry was created, maybe updated, then deleted even though it isn't reflected from just the KV store at present time, after this we will no longer know that that data ever existed and it will be truly unrecoverable.
One nice thing is that while this node goes into GC mode to do all this, the other nodes can continue doing work uninterrupted, then once they get the <new_chain, checkpoint> packet, simply cut off the old chain before the checkpoint, point the prevHash of the checkpoint block at the end of the new chain, and continue about things. The GC node then just needs to hear from someone about the work that was done while it was GC'ing and catch up.