apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.39k stars 1.3k forks source link

ConfigIncrementChangeCoordinators.toml nightly failure #8349

Closed sfc-gh-ljoswiak closed 1 year ago

sfc-gh-ljoswiak commented 1 year ago

The latest nightly shows a timeout in fast/ConfigIncrementChangeCoordinators.toml. It reproduces with the following:

Commit: df2c1374cb923e8da5aa9949839ef62ee0d36b91 Seed: 2893897391 Buggify: on

It appears the test run is getting stuck while moving the cstate at https://github.com/apple/foundationdb/blob/772a9ab9fc1800d7dfaacb38dcf94ec41a9b7c3b/fdbserver/CoordinatedState.actor.cpp#L350-L351. All old coordinators have their configuration nodes locked successfully, but a majority of ForwardRequest replies are never received. A recovery takes place right at this instant, and at this point the database is in an unhealthy state. The configuration nodes are all locked which prevent any further configuration database transactions, causing the test timeout.

sfc-gh-ljoswiak commented 1 year ago

I think the issue here was actually contention between clients. This issue should be fixed by #8879.