apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
https://celeborn.apache.org/
Apache License 2.0
862 stars 351 forks source link

[CELEBORN-1572] Celeborn CLI initial REST API support #2699

Closed akpatnam25 closed 1 week ago

akpatnam25 commented 3 weeks ago

What changes were proposed in this pull request?

Introducing the Celeborn CLI (based on this CPIP). For the first iteration, adding support for querying the existing REST api endpoints. After this will add a layer for external cluster manager support. Further improvements are needed such as pretty print, which can be added in subsequent PRs.

Why are the changes needed?

see CPIP

Does this PR introduce any user-facing change?

yes, new CLI tool.

How was this patch tested?

added UTs and also tested internally.

akpatnam25 commented 3 weeks ago

cc @waitinfuture @FMX @SteNicholas

akpatnam25 commented 3 weeks ago

++ @turboFei @RexXiong @mridulm

FMX commented 3 weeks ago

Thanks. I'll review this PR in the next week.

akpatnam25 commented 2 weeks ago

thanks for the review @RexXiong, I addressed your comments!! gentle ping for @FMX @SteNicholas @waitinfuture to review as well :)

FMX commented 2 weeks ago

There is some work to be done for this PR.

  1. Build CLI in make-distribution.
  2. Add a description of how to use the CLI in README.
akpatnam25 commented 2 weeks ago

This PR is interesting. Can you provide a doc about how to use this CLI? Including commands and their descriptions.

BTW, the UT TestCelebornCliCommands failed on my Mac. Is there somethings I missed to run the tests?

Thanks @FMX for reviewing!! TestCelebornCliCommands runs on my mac without any issues, and it seems like it runs fine on the github checks as well. What's the error you are facing on your mac?

Additionally, I have made the update to the make-distribution.sh script. If it all looks good, I can add the documentation to the README. PTAL, thanks!! cc @FMX

FMX commented 1 week ago

@akpatnam25 Please add a doc about how to use Celeborn Cli. After build and try to use Celeborn Cli, I got errors as following,

/Users/ethanfeng/codes/wcode/apache-repo/main/apache-celeborn-0.6.0-SNAPSHOT-bin/bin/celeborn-class: line 100: exec: -X: invalid option exec: usage: exec [-cl] [-a name] file [redirection ...]

I ran the UT and got error as following,

ApiException{code=0, responseHeaders=null, responseBody='null'} at org.apache.celeborn.rest.v1.worker.invoker.ApiClient.invokeAPI(ApiClient.java:1002) at org.apache.celeborn.rest.v1.worker.WorkerApi.getWorkerInfo(WorkerApi.java:102) at org.apache.celeborn.rest.v1.worker.WorkerApi.getWorkerInfo(WorkerApi.java:59) at org.apache.celeborn.cli.worker.WorkerSubcommandImpl.runShowWorkerInfo(WorkerSubcommandImpl.scala:43) at org.apache.celeborn.cli.worker.WorkerSubcommandImpl.run(WorkerSubcommandImpl.scala:29) at picocli.CommandLine.executeUserObject(CommandLine.java:2030) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) at picocli.CommandLine.execute(CommandLine.java:2174) at org.apache.celeborn.cli.CelebornCli$.main(CelebornCli.scala:45) at org.apache.celeborn.cli.TestCelebornCliCommands.$anonfun$captureOutputAndValidateResponse$1(TestCelebornCliCommands.scala:237) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at scala.Console$.withOut(Console.scala:167) at org.apache.celeborn.cli.TestCelebornCliCommands.captureOutputAndValidateResponse(TestCelebornCliCommands.scala:237) at org.apache.celeborn.cli.TestCelebornCliCommands.$anonfun$new$1(TestCelebornCliCommands.scala:74) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) at org.apache.celeborn.CelebornFunSuite.withFixture(CelebornFunSuite.scala:157) at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(CelebornFunSuite.scala:35) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) at org.apache.celeborn.CelebornFunSuite.runTest(CelebornFunSuite.scala:35) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) at scala.collection.immutable.List.foreach(List.scala:431) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) at org.scalatest.Suite.run(Suite.scala:1114) at org.scalatest.Suite.run$(Suite.scala:1096) at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) at org.scalatest.SuperEngine.runImpl(Engine.scala:535) at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(CelebornFunSuite.scala:35) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.celeborn.CelebornFunSuite.run(CelebornFunSuite.scala:35) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:47) at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1321) at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1315) at scala.collection.immutable.List.foreach(List.scala:431) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1315) at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:992) at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:970) at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1481) at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:970) at org.scalatest.tools.Runner$.run(Runner.scala:798) at org.scalatest.tools.Runner.run(Runner.scala) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:43) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:26) Caused by: org.apache.hc.core5.http.NoHttpResponseException: 30.221.117.156:1995 failed to respond at org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:301) at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:175) at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:218) at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.execute(PoolingHttpClientConnectionManager.java:717) at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.execute(InternalExecRuntime.java:216) at org.apache.hc.client5.http.impl.classic.MainClientExec.execute(MainClientExec.java:116) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:188) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:152) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:116) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.celeborn.rest.v1.worker.invoker.ApiClient.invokeAPI(ApiClient.java:999) ... 70 more

"Using connectionUrl: 30.221.117.156:1995 " was not empty, but "Using connectionUrl: 30.221.117.156:1995 " did not contain "WorkerInfoResponse" ScalaTestFailureLocation: org.apache.celeborn.cli.TestCelebornCliCommands at (TestCelebornCliCommands.scala:240) org.scalatest.exceptions.TestFailedException: "Using connectionUrl: 30.221.117.156:1995 " was not empty, but "Using connectionUrl: 30.221.117.156:1995 " did not contain "WorkerInfoResponse" at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) at org.apache.celeborn.cli.TestCelebornCliCommands.captureOutputAndValidateResponse(TestCelebornCliCommands.scala:240) at org.apache.celeborn.cli.TestCelebornCliCommands.$anonfun$new$1(TestCelebornCliCommands.scala:74) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) at org.apache.celeborn.CelebornFunSuite.withFixture(CelebornFunSuite.scala:157) at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(CelebornFunSuite.scala:35) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) at org.apache.celeborn.CelebornFunSuite.runTest(CelebornFunSuite.scala:35) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) at scala.collection.immutable.List.foreach(List.scala:431) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) at org.scalatest.Suite.run(Suite.scala:1114) at org.scalatest.Suite.run$(Suite.scala:1096) at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) at org.scalatest.SuperEngine.runImpl(Engine.scala:535) at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(CelebornFunSuite.scala:35) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.celeborn.CelebornFunSuite.run(CelebornFunSuite.scala:35) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:47) at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1321) at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1315) at scala.collection.immutable.List.foreach(List.scala:431) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1315) at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:992) at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:970) at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1481) at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:970) at org.scalatest.tools.Runner$.run(Runner.scala:798) at org.scalatest.tools.Runner.run(Runner.scala) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:43) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:26)

akpatnam25 commented 1 week ago

@FMX , the issue you are seeing with running should be fixed the CLI should be fixed. can you please try again on your mac to run the CLI. I have also added a simple user guide for now in this doc: https://docs.google.com/document/d/1Cq8g63m0PWzOeDIe75nVoEbq_Z2YJCedZ9MSk2D8Nc0/edit#heading=h.nxagrvjpzdtd At the end of the development after we add more features, I will add this into the website/README as well. For now, the doc is just very basic to help setup the CLI.

Regarding the unit tests, I am not sure what is happening on your machine, but it seems like it passes on github checks so I think it is ok. please take a look and let me know, thanks @FMX !!

mridulm commented 1 week ago

The CI failures are unrelated to this PR, merging to master.

mridulm commented 1 week ago

Merged to master. Thanks for adding this @akpatnam25 ! Thanks for reviews @FMX, @RexXiong, @turboFei :)