apache / helix

Mirror of Apache Helix
Apache License 2.0
457 stars 218 forks source link

Fix flaky updateInstance(org.apache.helix.rest.server.TestPerInstanceAccessor) #2825

Closed zpinto closed 1 month ago

zpinto commented 1 month ago

The cause of the test case being flaky is due to switching the resources from SEMI_AUTO to FULL_AUTO while the cluster is ### Issues

Description

The cause of the test case being flaky is due to switching the resources from SEMI_AUTO to FULL_AUTO while the cluster is in MaintenanceMode.

When a resource is SEMI_AUTO, the MM rebalancer is not used because that would cause the preferenceList to potentially change and never recover to what it previously was. In the test case, we were switching the resources from SEMI_AUTO to FULL_AUTO causing the MM rebalancer to be used. There is then a RACE condition between the controller computing a new IdealState which drops the offline instances from the preferenceList, making the IdealState invalid for SEMI_AUTO, and us setting the resources back to SEMI_AUTO. If the controller wins, persisting the IdealState again with SEMI_AUTO will throw an exception.

Removing this logic to just test that isEvacuateFinished is true since all resources are SEMI_AUTO. We test isEvacuateFinished on FULL_AUTO resources in other places like TestZkHelixAdmin and TestInstanceOperation.

Tests

Changes that Break Backward Compatibility (Optional)

NA

Commits

Code Quality