apache / helix

Mirror of Apache Helix
Apache License 2.0
461 stars 224 forks source link

Fix BestPossibleExternalViewVerifier to use a ZkClient that has the serializer set to ByteArraySerializer #2776

Closed zpinto closed 5 months ago

zpinto commented 6 months ago

Issues

Description

PR #2180 introduced a bug to BestPossibleExternalViewVerifier by reusing the zkClient passed to the builder. This can be an issue because the serializer is often set to ZNRecordSerializer instead of ByteArraySerializer. This failing verifier was not accurate since it was unable to read from the persisted best possible state in the assignment metadata store.

PR #2447 moves the logic for handling min active replicas into the emergency rebalance method. This will make the DryRunWagedRebalancer inaccurate because it would previously just return the ZK persisted best possible state for due to the override of computeBestPossibleAssignment. We are removing this method because we need to recompute the idealAssignment in addition to just reading the ZK persisted best possible state in order to retain the overwrites for handling min active replicas.

handleDelayedRebalanceMinActiveReplica modifies the currentResourceAssignment map passed into it. Because the reference actually points to map in the assignmentMetaStore, hanledDelayedRebalanceMinActiveReplica is actually modifying the cache. This causes isBestPossibleChanged to evaluate to true whenever hanledDelayedRebalanceMinActiveReplica is kicking in and causes continuous writes of the same best possible produced by partial rebalance over and over.

Tests

NA, issue with hanledDelayedRebalanceMinActiveReplica was caught after fixing BestPossibleExternalViewVerifier. Current tests are sufficient.

Changes that Break Backward Compatibility (Optional)

Commits

Code Quality

zpinto commented 5 months ago

This PR is ready to be merged!

Final Commit Message: Fix BestPossibleExternalViewVerifier to use a ZkClient that has the serializer set to ByteArraySerializer so it can read the assignment meta store best possible state. Fix BestPossibleExternalViewVerifier to actually calculate BEST_POSSIBLE instead of returning last persisted to ZK because we now need to consider handleDelayedRebalanceMinActiveReplica not being persisted to ZK(#2447). Fix handleDelayedRebalanceMinActiveReplica modifying in-memory _bestPossibleState in the _assignmentMetadataStore which was causing best possible state to continuously be persisted until handleDelayedRebalanceMinActiveReplica wasn't kicking in anymore.