apache / helix

Mirror of Apache Helix
Apache License 2.0
457 stars 218 forks source link

Metaclient updater retry logic #2805

Closed GrantPSpencer closed 1 month ago

GrantPSpencer commented 2 months ago

Issues

Retry logic for metaclient updater

Description

Metaclient updater currently does not retry on no node or version mismatch. This change includes logic to attempt to create mode if it does not exist and also retry on version mismatch.

Please take a look at the test case as well, is there a better way to recreate the version mismatch race condition?

Tests

meta-client/src/main/java/org/apache/helix/metaclient/impl/zk/ZkMetaClient.java #testUpdate

$ mvn test -o -pl=meta-client

[INFO] Tests run: 68, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 755.799 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 68, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] --- jacoco:0.8.6:report (generate-code-coverage-report) @ meta-client ---
[INFO] Loading execution data file /Users/gspencer/Desktop/git-repos/helix/meta-client/target/jacoco.exec
[INFO] Analyzed bundle 'Apache Helix :: Meta Client' with 78 classes
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  12:41 min
[INFO] Finished at: 2024-05-30T21:50:37-07:00
[INFO] ------------------------------------------------------------------------

Changes that Break Backward Compatibility (Optional)

N/A

Commits

Code Quality

GrantPSpencer commented 2 months ago

11bf562 Refactored updater tests to remove use of threads.

$ mvn test -o -Dtest=TestZkMetaClient.java -pl=meta-client
[INFO] Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.671 s - in org.apache.helix.metaclient.impl.zk.TestZkMetaClient
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 20, Failures: 0, Errors: 0, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  21.240 s
[INFO] Finished at: 2024-06-05T15:41:14-07:00
[INFO] ------------------------------------------------------------------------
GrantPSpencer commented 1 month ago

https://github.com/apache/helix/actions/runs/9392174978 Failed due to #2788 no failures in metaclient module

2024-06-06T01:14:54.1015466Z [info] ./helix-core/target/surefire-reports/TestSuite.txt: Tests run: 1420, Failures: 1, Errors: 0, Skipped: 17, Time elapsed: 6,000.559 s <<< FAILURE! - in TestSuite
2024-06-06T01:14:54.1042519Z ##[error] Test failed: testNodeSwap(org.apache.helix.integration.rebalancer.TestInstanceOperation)  Time elapsed: 18.28 s  <<< FAILURE!
2024-06-06T01:14:54.1045991Z [info] ./zookeeper-api/target/surefire-reports/TestSuite.txt: Tests run: 99, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 234.346 s - in TestSuite
2024-06-06T01:14:54.1048431Z [info] ./metrics-common/target/surefire-reports/TestSuite.txt: Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.247 s - in TestSuite
2024-06-06T01:14:54.1050817Z [info] ./meta-client/target/surefire-reports/TestSuite.txt: Tests run: 70, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 674.295 s - in TestSuite
2024-06-06T01:14:54.1053642Z [info] ./metadata-store-directory-common/target/surefire-reports/TestSuite.txt: Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.88 s - in TestSuite
2024-06-06T01:14:54.1056149Z [info] ./helix-common/target/surefire-reports/TestSuite.txt: Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.265 s - in TestSuite
2024-06-06T01:14:54.1133664Z Post job cleanup.
GrantPSpencer commented 1 month ago

Pull request approved by @xyuanlu , @junkaixue Commit message: Add retry logic to MetaClient Updater