apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.51k stars 1.31k forks source link

Degraded transaction logs are not removed during recovery #5679

Open sfc-gh-abeamon opened 3 years ago

sfc-gh-abeamon commented 3 years ago

When a transaction log is unable to commit or do a few other things during its local recovery, it gets marked degraded. This status gets reported to the cluster controller, who would then attempt to recruit a new transaction subsystem without any degraded logs.

If a log gets reported degraded during recovery, though, and that degradation prevents the recovery from completing, then the cluster controller will not try to replace it. If I understand correctly, this is because betterMasterExists does not attempt to reevaluate the cluster layout if it is not sufficiently recovered:

https://github.com/apple/foundationdb/blob/5a5f724d9c7f1c1fac47a610264effc4b44d300e/fdbserver/ClusterController.actor.cpp#L2223

This behavior was observed in 6.2, and while the line above still exists I'm not sure if this is impacted by some of the other newer changes to the degradation logic.

sfc-gh-etschannen commented 3 years ago

I believe this was already fixed with the ioDegradedOrTimeoutError function. It will throw an io_error if the tlog cannot commit for a long time.

sfc-gh-abeamon commented 3 years ago

I agree that does seem like it would help, and I think the situation we saw here may have been on a version prior to this change. That said, the timeout for that error is much longer at 2 minutes, and so if we can improve our reaction time to this process being degraded it would still be beneficial.

sfc-gh-abeamon commented 3 years ago

Also, it seems that we don't use ioDegradedOrTimeoutError in all of the old tlog implementations. Instead, it's just the current one and 6.0.