eBay / HomeObject

Replicated BLOB Store built upon HomeStore
Apache License 2.0
6 stars 15 forks source link

SealShard state change in pre-commit #99

Closed xiaoxichen closed 8 months ago

xiaoxichen commented 1 year ago

Now we update shard state in commit, which introduce racing beween sealshard vs putBlob.

Marking shard as sealed in pre-commit , failing following putBlob(whose LSN >X) , and do roll_back if consensus cannot reach

JacksonYao287 commented 8 months ago

I revisit the idea here and have a new thought.

if we rollback seal_shard and change the shard status from sealed to open in some corner case , who can seal this shard again?

when dm open a shard , it will store the shard info in the kv store of cm. when it closes the shard, it will remove that kv item from cm. if dm send seal_shard message to sm leader, and sm leader will pre_commit the message, which will change the status of the shard to sealed , and send a success response to dm before it sends the log entry to followers. if dm receives the ok response, it will remove the kv item from cm and konw nothing about that shard. if the message is rollbacked in some corner case , the shard will be changed from sealed to open, and it will be an orphan open shard since nobody knows it is in open state, and thus nobody will try to seal it.

do we need a separate scrubber to find all the orphan open shards and seal them?

@xiaoxichen @szmyd

xiaoxichen commented 8 months ago

I am not sure if I understand it correctly. The rollback will be called only when the quorum cannot reach and decide to rollback this commit . In this case, why client(DM ) will get a success response?

szmyd commented 8 months ago

I'm trying to remember this issue and reconcile with the changes we've made recently. So the problem is that a Blob can be received on the data path referencing a Shard that has been sealed already right? This could happen because of ordering issues between the DataPath and RaftPath or when a member is resyncing and receiving Blobs and log entries out-of-order?

Obviously we need to receive all Blob data over the data path prior to processing any SealShard in the log the blobs reference, or at the very least before putting it back in the heap allocator...

JacksonYao287 commented 8 months ago

@xiaoxichen according to the design and workflow of nuraft , in precommit case , it will response to client after appending log but before sending to the followers. it means precommit will response to client before reaching quorum consensus and that is why precommit will achieve low latency by sacrificing consistency and resolving conflicts manually

xiaoxichen commented 8 months ago

@xiaoxichen according to the design and workflow of nuraft , in precommit case , it will response to client after appending log but before sending to the followers. it means precommit will response to client before reaching quorum consensus and that is why precommit will achieve low latency by sacrificing consistency and resolving conflicts manually

are we using async mode? I think we are using regular flow https://github.com/eBay/NuRaft/blob/master/docs/replication_flow.md

JacksonYao287 commented 8 months ago

@xiaoxichen yes, you are right. then, the problem does not exist, we can go ahead. thx!