getkyo / kyo

Toolkit for Scala Development
https://getkyo.io
Apache License 2.0
559 stars 47 forks source link

gRPC Support #390

Open ghostdogpr opened 6 months ago

ghostdogpr commented 6 months ago

Support for protobuf and gRPC would drive adoption (at least for me).

It's definitely not easy as it involves code generation but maybe https://scalapb.github.io/docs/writing-plugins could be used like it is for zio-grpc and fs2-grpc.

There is currently the option of using zio-grpc with the zio interop but its performance is not great to start with (https://github.com/LesnyRumcajs/grpc_bench/discussions/441) so there might be an opportunity to shine.

He-Pin commented 6 months ago

Yes, looking forward to it, grpc_bench is a great place to show .

fwbrasil commented 6 months ago

/bounty $200

algora-pbc[bot] commented 6 months ago

💎 $200 bounty • Kyo

💎 $250 bounty • Kaizen Solutions

Steps to solve:

  1. Start working: Comment /attempt #390 with your implementation plan
  2. Submit work: Create a pull request including /claim #390 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to getkyo/kyo!

Add a bounty • Share on socials

Attempt Started (GMT+0) Solution
🟢 @steinybot May 30, 2024, 12:31:58 PM WIP
steinybot commented 6 months ago

/attempt #390

I've had a little look at this.

The code generation will take a little while but there is enough documentation and prior art that it should be easy enough once we know what it should look like.

I tried a naive implementation:

package com.example.helloworld

import io.grpc.*
import io.grpc.examples.helloworld.helloworld.*
import io.grpc.stub.{ServerCalls, StreamObserver}
import kyo.*

import java.util.concurrent.Executors
import scala.util.Try
import scala.util.chaining.*

type Grpcs >: Grpcs.Effects <: Grpcs.Effects

object Grpcs:
  type Effects = Fibers & Aborts[StatusException]

  def init[T: Flat](v: => T < Grpcs): Fiber[T] < IOs =
    def pendingFibers: Try[T] < Fibers = Aborts.run[StatusException].apply[StatusException, T, Fibers, StatusException, Any](v).map(_.toTry)

    val pendingIOs: Fiber[Try[T]] < IOs = Fibers.init(pendingFibers)
    pendingIOs.map(_.transform(_.fold(Fiber.fail, Fiber.value)))

  def run[T: Flat](v: Fiber[T] < IOs): Fiber[T] =
    IOs.run(v)

trait KyoGreeter:
  def sayHello(request: HelloRequest): HelloReply < Grpcs

object GreeterService extends KyoGreeter:
  override def sayHello(request: HelloRequest): HelloReply < Grpcs =
    for {
      _ <- Consoles.run(Consoles.println(s"Got request: $request"))
    } yield HelloReply(s"Hello, ${request.name}")

object HelloWorldServer extends KyoApp:

  private def buildGreeterService(serviceImpl: KyoGreeter): ServerServiceDefinition =
    ServerServiceDefinition.builder(SERVICE)
      .addMethod(
        METHOD_SAY_HELLO,
        // TODO: When to use a different type of call?
        // TODO: Is there any kind of backpressure or do these closures keep building up?
        ServerCalls.asyncUnaryCall((request: HelloRequest, observer: StreamObserver[HelloReply]) => {
          val fiber = Grpcs.run(Grpcs.init(serviceImpl.sayHello(request)))
          IOs.run(fiber.onComplete { reply =>
            IOs.attempt(reply).map(scalapb.grpc.Grpc.completeObserver(observer))
          })
        }))
      .build()

  private def buildServer(port: Int, services: Seq[ServerServiceDefinition]): Server =
    val builder = services.foldLeft(ServerBuilder.forPort(port))(_.addService(_))

    /**
     * Allow customization of the Executor with two environment variables:
     *
     * <p>
     * <ul>
     * <li>JVM_EXECUTOR_TYPE: direct, workStealing, single, fixed, cached</li>
     * <li>GRPC_SERVER_CPUS: integer value.</li>
     * </ul>
     * </p>
     *
     * The number of Executor Threads will default to the number of
     * availableProcessors(). Only the workStealing and fixed executors will use
     * this value.
     */
    val threads = System.getenv("GRPC_SERVER_CPUS")
    var i_threads = Runtime.getRuntime.availableProcessors
    if (threads != null && !threads.isEmpty) i_threads = threads.toInt
    val value = System.getenv.getOrDefault("JVM_EXECUTOR_TYPE", "workStealing")
    val builderWithExecutor = value match {
      case "direct" => builder.directExecutor
      case "single" => builder.executor(Executors.newSingleThreadExecutor)
      case "fixed" => builder.executor(Executors.newFixedThreadPool(i_threads))
      case "workStealing" => builder.executor(Executors.newWorkStealingPool(i_threads))
      case "cached" => builder.executor(Executors.newCachedThreadPool)
    }

    builderWithExecutor.build()

  private val port: Int = 50051

  private val services = Seq(
    buildGreeterService(GreeterService)
  )

  run {
    for {
      // TODO: Get the shutdown working properly.
      _ <- Consoles.println(s"Server is running on port $port. Press Ctrl-C to stop.")
      server <- Resources.acquireRelease(IOs(buildServer(port, services).start())) { (server: Server) =>
        IOs.run(Consoles.run(Consoles.print("Shutting down...")))
        // TODO: Add a timeout.
        server.shutdown().awaitTermination()
        IOs.run(Consoles.run(Consoles.println("Done.")))
      }
      _ <- Fibers.sleep(Duration.Infinity)
    } yield {
      "Goodbye!"
    }
  }

end HelloWorldServer

It didn't perform too great which is not that surprising:

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| scala_akka                  |   23998 |       31.39 ms |       81.50 ms |       88.65 ms |      102.31 ms |   52.49% |    216.35 MiB |
| scala_pekko                 |   23810 |       32.40 ms |       82.83 ms |       90.08 ms |      102.21 ms |   44.11% |    190.85 MiB |
| scala_fs2                   |   23270 |       34.14 ms |       79.12 ms |       86.97 ms |      105.47 ms |   81.39% |    241.85 MiB |
| scala_kyo                   |   20744 |       40.89 ms |       85.64 ms |       98.11 ms |      178.73 ms |   81.58% |     244.2 MiB |
| scala_zio                   |   19573 |       43.24 ms |       89.10 ms |      103.96 ms |      200.26 ms |  101.32% |    249.95 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
160be4a Thu, 25 Apr 2024 16:18:16 +0200 GitHub Update sbt-assembly to 2.2.0 (#446)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0
-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| scala_akka                  |   55639 |       14.39 ms |       50.75 ms |       62.50 ms |       92.22 ms |   96.89% |    241.54 MiB |
| scala_pekko                 |   55609 |       14.59 ms |       52.69 ms |       62.66 ms |       84.92 ms |   96.83% |    258.92 MiB |
| scala_fs2                   |   44931 |       19.97 ms |       58.33 ms |       67.13 ms |       91.93 ms |   99.48% |    261.63 MiB |
| scala_zio                   |   34658 |       26.98 ms |       67.05 ms |       77.79 ms |      164.29 ms |   98.73% |    336.51 MiB |
| scala_kyo                   |   33819 |       27.80 ms |       67.40 ms |       77.69 ms |      149.42 ms |   99.02% |    308.01 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
160be4a Thu, 25 Apr 2024 16:18:16 +0200 GitHub Update sbt-assembly to 2.2.0 (#446)
- GRPC_BENCHMARK_DURATION=120s
- GRPC_BENCHMARK_WARMUP=30s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=9
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0

Anything obviously wrong with this?

steinybot commented 4 months ago

This is still a WIP. I made some good progress on the code gen side of things. I haven't had much time in the last 2 weeks. I hope to get back to this in a week or so.

steinybot commented 4 months ago

Not sure what I changed but after implementing it as a library it is now significantly slower:

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| scala_zio                   |   31949 |       29.93 ms |       66.57 ms |       76.40 ms |      132.59 ms |   100.5% |    315.89 MiB |
| scala_kyo                   |    7056 |      140.20 ms |      214.61 ms |      502.63 ms |         1.39 s |  101.77% |    333.34 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
DamianReeves commented 2 months ago

Any updates on this front?

steinybot commented 2 months ago

I haven't managed to dig into the performance issues. I created some benchmarks but that is as far as I got. I'm in the middle of merging the new core changes in.

Unfortunately I'm unexpectedly out of a job so I actually have less free time while I apply for jobs and do interviews etc. Happy to share what I have done and split the bounty if you want to help out.

Luka-J9 commented 2 months ago

Very first run of a naive implementation

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| scala_pekko                 |   13605 |       61.45 ms |      100.15 ms |      105.57 ms |      198.43 ms |   63.48% |     179.7 MiB |
| scala_akka                  |   13575 |       62.84 ms |      100.16 ms |      105.76 ms |      194.06 ms |   67.18% |     216.9 MiB |
| scala_kyo                   |   11866 |       79.19 ms |      112.26 ms |      180.91 ms |      199.23 ms |  100.22% |    253.25 MiB |
| scala_zio                   |    7446 |      133.35 ms |      209.98 ms |      289.29 ms |      399.28 ms |  101.19% |     238.6 MiB |
| scala_fs2                   |    7229 |      136.40 ms |      207.54 ms |      282.75 ms |      392.14 ms |  101.11% |    213.25 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
38875a7 Thu, 5 Sep 2024 12:40:14 +0200 GitHub Add the `wtx` project (#473)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0
All done.

Tweaked some settings and got a little bit of feedback from @fwbrasil on discord

The main change the RAM settings - I think the reason for the large delta is due to the fact that the average usage was above the default 512m of RAM

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| scala_pekko                 |   27278 |       30.43 ms |       75.85 ms |       81.83 ms |       94.50 ms |   49.65% |    197.42 MiB |
| scala_akka                  |   27056 |       31.11 ms |       75.19 ms |       81.90 ms |       96.86 ms |   54.33% |    258.42 MiB |
| scala_kyo                   |   24169 |       38.39 ms |       85.40 ms |       93.77 ms |      105.02 ms |   92.18% |    577.13 MiB |
| scala_zio                   |   23312 |       40.67 ms |       87.79 ms |       95.59 ms |      110.96 ms |   98.52% |    642.67 MiB |
| scala_fs2                   |   23161 |       40.22 ms |       89.31 ms |       98.35 ms |      112.02 ms |   95.25% |    526.33 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
38875a7 Thu, 5 Sep 2024 12:40:14 +0200 GitHub Add the `wtx` project (#473)
- GRPC_BENCHMARK_DURATION=60s
- GRPC_BENCHMARK_WARMUP=15s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=2048m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=2
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0
All done.
steinybot commented 2 months ago

I'm back onto this now. I hope to have a draft PR up in a day or two.