During sharding integration tests, sharding components will occasionally go into error state when executing update_status. However when the hook gets fired again the component goes out of error. The errors are either SeverSelectionError or OperationFailure (code 18) indicating the cluster is still syncing either password or internal membership. The current check for cluster_password_synced is not robust enough to catch these.
Solution
Update cluster_password_synced and its dependent functions to catch these errors right away.
Issue
During sharding integration tests, sharding components will occasionally go into error state when executing
update_status
. However when the hook gets fired again the component goes out of error. The errors are eitherSeverSelectionError
orOperationFailure
(code 18) indicating the cluster is still syncing either password or internal membership. The current check forcluster_password_synced
is not robust enough to catch these.Solution
Update
cluster_password_synced
and its dependent functions to catch these errors right away.