apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
https://celeborn.apache.org/
Apache License 2.0
896 stars 361 forks source link

[CELEBORN-1702] Bump Ratis version from 3.1.1 to 3.1.2 #2897

Closed SteNicholas closed 2 weeks ago

SteNicholas commented 2 weeks ago

What changes were proposed in this pull request?

Bump Ratis version from 3.1.1 to 3.1.2 including:

Why are the changes needed?

Bump Ratis version from 3.1.1 to 3.1.2. Ratis has released v3.1.2, of which release note refers to 3.1.2. The 3.1.2 version is a minor release with multiple improvements and bugfixes including [RATIS-2179] Fix NPE in RaftServerImpl.getLogInfo. See the changes between 3.1.1 and 3.1.2 releases.

The 3.1.2 version fixed the following NullPointerException in CI log:

[info] Test org.apache.celeborn.service.deploy.master.clustermeta.ha.MasterRatisServerSuiteJ.testIsLeader started
24/10/24 08:16:30,295 ERROR [pool-1-thread-1] HARaftServer: Failed to retrieve RaftPeerRole. Setting cached role to UNRECOGNIZED and resetting leader info.
java.io.IOException: java.lang.NullPointerException
    at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:56)
    at org.apache.ratis.server.impl.RaftServerImpl.waitForReply(RaftServerImpl.java:1148)
    at org.apache.ratis.server.impl.RaftServerProxy.getGroupInfo(RaftServerProxy.java:607)
    at org.apache.celeborn.service.deploy.master.clustermeta.ha.HARaftServer.getGroupInfo(HARaftServer.java:599)
    at org.apache.celeborn.service.deploy.master.clustermeta.ha.HARaftServer.updateServerRole(HARaftServer.java:514)
    at org.apache.celeborn.service.deploy.master.clustermeta.ha.HARaftServer.isLeader(HARaftServer.java:489)
    at org.apache.celeborn.service.deploy.master.clustermeta.ha.MasterRatisServerSuiteJ.testIsLeader(MasterRatisServerSuiteJ.java:47)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
    at com.novocode.junit.JUnitTask.execute(JUnitTask.java:64)
    at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
    at org.apache.ratis.server.impl.RaftServerImpl.getLogInfo(RaftServerImpl.java:665)
    at org.apache.ratis.server.impl.RaftServerImpl.getGroupInfo(RaftServerImpl.java:658)
    at org.apache.ratis.server.impl.RaftServerProxy.lambda$getGroupInfoAsync$23(RaftServerProxy.java:613)
    at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
    at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
    at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 32.12%. Comparing base (165e914) to head (c887037). Report is 24 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2897 +/- ## ========================================== - Coverage 32.35% 32.12% -0.22% ========================================== Files 329 329 Lines 19511 19677 +166 Branches 1752 1774 +22 ========================================== + Hits 6311 6320 +9 - Misses 12855 13011 +156 - Partials 345 346 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

SteNicholas commented 2 weeks ago

Ping @FMX, @mridulm.

SteNicholas commented 2 weeks ago

@FMX, I have rebased the main branch. PTAL.