GRPC benchmarks for large data ~ 656K KB between python and java are way off

yawningphantom commented 1 year ago

Hello @LesnyRumcajs

I have a use case where I need to develop a service capable of handling large payload sizes ranging from 600KB to 1MB. I'm currently deciding between using Java or Python for this task. To make an informed decision, I conducted benchmarking tests using python_grpc_bench and java_quarkus_bench. Surprisingly, the results showed that the Python implementation performed well or even comparably to the Java implementation.

I ran the benchmarks on a Linux machine and while this doesn't seem to be a critical issue, I wanted to inquire if you have also experimented with larger payloads for these benchmarks and if you encountered any challenges or noteworthy observations. Your insights would be valuable in guiding my decision-making process. Thank you!

my machine configuration -

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz

0                                             bus                 Motherboard
/0/0                                         memory         32GiB System memory
/0/1                                          processor      Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
/0/100                                     bridge             440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled)
/0/100/7                                  bridge             82371AB/EB/MB PIIX4 ISA
/0/100/7.1        scsi1                storage          82371AB/EB/MB PIIX4 IDE
/0/100/7.1/0.0.0  /dev/cdrom  disk               Virtual CD/ROM

Sample payload size ~ 24KB json file Sample payload attached here -

Result

==> Running benchmark for java_quarkus_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
        done.
        Results:
            Requests/sec:   218.12
==> Running benchmark for python_grpc_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
        done.
        Results:
            Requests/sec:   215.17
-----
Benchmark finished. Detailed results are located in: results/231907T133929
-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| python_grpc                 |     185 |         3.98 s |         6.61 s |         7.59 s |         8.80 s |   15.11% |     58.72 MiB |
| java_quarkus                |     183 |         4.06 s |        11.30 s |        14.57 s |        20.51 s |   41.71% |     169.9 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
03e7e70 Mon, 17 Jul 2023 20:22:42 +0200 GitHub [feat] updated micronaut to 4.0 (#370)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0
All done.

LesnyRumcajs commented 1 year ago

Hey @yawningphantom Using 1 CPU for both the server and the client is too low. It's good for checking if the service works at all.

Looking at your specs, I recommend setting GRPC_CLIENT_CPUS to perhaps 5. Moreover, Java tends to perform significantly better when on at least 2 CPUs. I'm not sure your CPU is powerful enough to do such test, though. In general, I tend to use at least 3 cores of client per 1 core of server, but you may try. See https://github.com/LesnyRumcajs/grpc_bench/blob/master/example_benchmark.sh

Does it help?

yawningphantom commented 1 year ago

Hey @LesnyRumcajs Thanks for the reply, I had ran the benchmark for both where only GRPC_CLIENT_CPUS=5 and one where both client and server cpu's are 5. And still did not see any major difference between python and java which seems a bit weird. Not sure why is that the case.

==> Running benchmark for java_quarkus_bench...
Waiting for server to come up... ready.
Warming up the service for 10s... done.
Benchmarking now...
        done.
        Results:
            Requests/sec:   9.98
==> Running benchmark for python_grpc_bench...
Waiting for server to come up... ready.
Warming up the service for 10s... done.
Benchmarking now...
        done.
        Results:
            Requests/sec:   9.98
-----
Benchmark finished. Detailed results are located in: results/231907T051744
-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| java_quarkus                |      10 |       67.29 ms |       80.64 ms |       99.35 ms |      134.18 ms |   24.43% |     148.6 MiB |
| python_grpc                 |      10 |       53.15 ms |       61.81 ms |       67.81 ms |       77.80 ms |    7.58% |     32.92 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
03e7e70 Mon, 17 Jul 2023 20:22:42 +0200 GitHub [feat] updated micronaut to 4.0 (#370)
- GRPC_BENCHMARK_DURATION=60s
- GRPC_BENCHMARK_WARMUP=10s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=5120m
- GRPC_CLIENT_CONNECTIONS=10
- GRPC_CLIENT_CONCURRENCY=20
- GRPC_CLIENT_QPS=10
- GRPC_CLIENT_CPUS=5
- GRPC_REQUEST_SCENARIO=complex_proto

==> Running benchmark for java_quarkus_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
        done.
        Results:
            Requests/sec:   0.95
==> Running benchmark for python_grpc_bench...
Waiting for server to come up... ready.
Warming up the service for 5s... done.
Benchmarking now...
        done.
        Results:
            Requests/sec:   0.95
-----
Benchmark finished. Detailed results are located in: results/231907T051324
-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| java_quarkus                |       1 |       64.77 ms |       82.63 ms |       85.83 ms |       85.83 ms |    3.89% |    145.53 MiB |
| python_grpc                 |       1 |       62.92 ms |       76.76 ms |      114.02 ms |      114.02 ms |    1.54% |     15.75 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
03e7e70 Mon, 17 Jul 2023 20:22:42 +0200 GitHub [feat] updated micronaut to 4.0 (#370)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=5
- GRPC_SERVER_RAM=5120m
- GRPC_CLIENT_CONNECTIONS=10
- GRPC_CLIENT_CONCURRENCY=20
- GRPC_CLIENT_QPS=1
- GRPC_CLIENT_CPUS=5
- GRPC_REQUEST_SCENARIO=complex_proto
- GRPC_GHZ_TAG=0.114.0
All done.

LesnyRumcajs commented 1 year ago

@yawningphantom You imposed rather harsh limits on the client.GRPC_CLIENT_QPS=1 means that the client will send only one request per second, which you can confirm in the results. If that or 10 is your expected load then it may be that Python is a better choice for you. Python will definitely be outperformed with higher loads, though, as you can confirm in the results

gcnyin commented 1 year ago

Could you provide the test cases? Then I can run it again to verify.

LesnyRumcajs commented 8 months ago

Stale.

LesnyRumcajs / grpc_bench

GRPC benchmarks for large data ~ 656K KB between python and java are way off #378