Open kostasrim opened 5 days ago
@chakaz Regarding the comment that CVCUponInsert calls WriteBucket multiple times, I think that ok.. because what we care is that we dont break the bucket into multiple pieces but I dont think we should care if CVCUponInsert serializes one bucket we will preempt , serialize onother bucket from different flow and than go back to continue serialize the next buckets from CVCUponInsert. Is that right?
I'm afraid of some edge case that will cause us to lose data being sent to replica/RDB file. For example, touching k1
triggers OnDbChange
and also causes a bucket/segment split. We start serializing the bucket, and yield to k2
(same bucket/segment) which splits the bucket and moves keys around, and when we return to k1
's OnDbChange
we missed stuff.
Also, when do we set the bucket version? If before we start serializing the bucket in OnDbChange
, then a parallel OnDbChange
on another key in the same bucket will be missed, but if after then we will process the bucket twice (which is also not good because we can't save the same key twice, as it produces corrupted RDB files).
I think that locking the entire operation instead of individual keys can solve these (and potentially other more complex issues we didn't think about), and really there's no price to pay for it
@chakaz Regarding the comment that CVCUponInsert calls WriteBucket multiple times, I think that ok.. because what we care is that we dont break the bucket into multiple pieces but I dont think we should care if CVCUponInsert serializes one bucket we will preempt , serialize onother bucket from different flow and than go back to continue serialize the next buckets from CVCUponInsert. Is that right?
I'm afraid of some edge case that will cause us to lose data being sent to replica/RDB file. For example, touching
k1
triggersOnDbChange
and also causes a bucket/segment split. We start serializing the bucket, and yield tok2
(same bucket/segment) which splits the bucket and moves keys around, and when we return tok1
'sOnDbChange
we missed stuff. Also, when do we set the bucket version? If before we start serializing the bucket inOnDbChange
, then a parallelOnDbChange
on another key in the same bucket will be missed, but if after then we will process the bucket twice (which is also not good because we can't save the same key twice, as it produces corrupted RDB files). I think that locking the entire operation instead of individual keys can solve these (and potentially other more complex issues we didn't think about), and really there's no price to pay for it
I think I agree with what you wrote but if this is the case that we have a problem for example the CVCUponBump flow which call the OnDbChange in a loop, so in this case we can preempt between transactions and this will cause a bug
@kostasrim I believe we can use CondVarAny here instead of the mutex, this will be faster without locks and in most use cases we will not have any preeptions as the values will not be big so we will not need to preempt. Take an example with util::fb2::CondVarAny pipeline_cnd; in our code
@chakaz draft so we can discuss :)