Closed drinkbeer closed 2 years ago
Hi Jianbin,
very impressive work so far! I will tell you what I know and what I do not know.
Facts that I know:
10ms
(is it avg? 99th? ) is also an awful number for an in-memory store. You should not get there.Now, it's hard for me to say what causes this based on the data you put here because a) I do not have hands-on experience with K8S as a deployment system b) there is some missing data
Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.
When you put multiple pods like Redis/KeyDB/DF on the same node, do not expect that they all get dedicated networking capacity: you are bounded by limitations of the underlying hardware but now it's divided between two hungry pods.
You did not write were do you benchmark them from. Is it a different node? same node? same zone?
What I would do is the following:
--threads
argument in memtier. --clients
in 10-40 range is fine. --thread
should be set in such way that a server under test would give you the highest throughput but still give you low latency. For DF it's easy to see - if it uses more than 95% of the total CPU it means it reaches the limits of the underlying machine. For KeyDB - it depends on the server-threads
argument you pass (the suggest 4 but I used 8 in my tests). In any case, if you see in htop
that KeyDB K
cpus are at 100% then it means it won't go higher as well. If you see that avg latency is above 1ms it means that the server is overloaded and you should probably decrease --threads
in your memtier. by running on a raw GCP instance, you will learn what are the "normal" performance ranges of each server and what are the normal latencies and what are the optimal configurations for memtier.
Once you have this, you may start working your way to K8S. But I would not jump straight there. I would first run your favorite configuration above but with running servers from a container instead running a native binary.
Now, there are some options here too. You can run docker run --network=host
or with port mappings. I suspect that port mapping will degrade your numbers greatly but maybe --network=host
will also affect them.
If you use pipelining, be ready to reduce --clients
parameter to 5-10. The pipelining affects latency as well.
To summarize, signs of a good benchmark:
Regarding (2) for some instance types you won't be able to reach full CPU utilization (i.e. 16 cores working at 100%) if they are network bound. But you should probably still see well above 1M qps on DF on n2 with 16 cores.
And do not forget to drink beer!
Just noticed your other memtier parameters. You can be a bit more frisky with the keyspace lengths, you use big instances, so it's ok: --key-maximum=10000000
.
I do not know if --key-pattern=S:R
was a deliberate choice. I used R:R
but also used --distinct-client-seed
. Otherwise, each client connection goes through exactly the same route...
Thank you so much for your reply. I think your suggestion of testing with GCP instance is great. I will follow the steps and try to get ideal P99 latency. I will update the results here once I finish the tests.
Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.
No. They are running on two different nodes in the same cluster, so same region (us-east1
), and same zone (us-east1-d
). Memtier jobs are also running on the same nodes (memtier job for keydb runs in the same node as keydb cache; memtier job for dragonfly runs in the same node as dragonfly cache).
And two memtier jobs are running sequentially to avoid saturating the network.
I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...
I first runs pure set
memtier job that does only set from key:1
to key:10000000
to test set operations only; then I run pure get
memtier job which has 100% hit rate as all the keys in the key space are filled; then I run mixed-set-get
memtier job which has 25% of set and 75% of get, and the hit rate is 100% as well (Check the Misses/sec
metric, they are 0 which means hit rate is 100%). I think the --key-pattern=S:R
only affects the third memtier job (mixed ops one). Because hit rate is 100%, I think the key-pattern
doesn't affect our performance a lot. But I could try --distinct-client-seed
option in my test in GCP instance.
I would run memtier separately from the server node as well. Not that it's impossible to get 1M running both on the same machine but it greatly affects benchmark numbers when reaching high throughput ranges.
I would run memtier separately from the server node as well. Not that it's impossible to get 1M running both on the same machine but it greatly affects benchmark numbers when reaching high throughput ranges.
A good point. I originally thought that running them on the same node will save some networks. I will run them in separate instances when benchmarking with GCP instances.
@drinkbeer
Any chance you have a reproducible bash script that would cover the tests you are trying between DF
and Key
?
These are wonderful bits of feedback; it would be interesting to make a canonical test script and deployment yaml
so that different users on different platforms can execute the same test suite.
While I don't have any better feedback than what @romange provided, I could take a swing at dockerizing a test script to be more consistently reproducible and include other platforms as well in the future.
@ryanrussell in terms of priority for the project, writing canonical benchmarking scripts is less important right now. You have great knowledge on how to improve the maintainability and manageability of the project. I think these areas will have the highest ROI if tackled sooner.
Hey, @romange @ryanrussell , I followed your suggestions and re-ran all the tests in GCP VM instances. Dragonfly overwhelms KeyDB in P99 latency (1.24700 ms
vs 1.99100 ms
), throughput (578167.18 ops/sec
vs 322822.64 ops/sec
), memory used (2.84GiB
vs 3.70G
). But in the machine observability dashboard, I found that the peak CPU usage for Dragonfly is much higher than KeyDB (50%
vs 10%
).
My next step is to benchmark with Docker and Kubernetes. And will update the results in this issue.
Dragonfly
===============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
-------------------------------------------------------------------------------------------------------------------------------
Sets 580235.73 0.00 0.00 0.52789 0.48700 1.34300 2.36700 194854.33
Gets 585411.39 585411.39 0.00 0.51945 0.47900 1.27900 2.71900 193733.96
Mixed 578167.18 433625.38 0.00 0.52565 0.48700 1.24700 1.71900 192042.35
Memory Usage: 2.84GiB
KeyDB
===============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
-------------------------------------------------------------------------------------------------------------------------------
Sets 288430.42 0.00 0.00 1.03785 0.89500 2.71900 7.16700 96860.48
Gets 380243.32 380243.32 0.00 0.80936 0.68700 1.48700 2.03100 125836.37
Mixed 322822.64 242116.98 0.00 0.95814 0.80700 1.99100 6.01500 107227.84
Memory Usage: 3.70G
I provisioned three VM instances:
Dragonfly:
jchome@dragonfly-worker:~/dragonfly/build-opt$ ./dragonfly --alsologtostderr
I20220609 19:01:26.782763 22665 init.cc:56] ./dragonfly running in opt mode.
I20220609 19:01:26.783080 22665 dfly_main.cc:179] maxmemory has not been specified. Deciding myself....
I20220609 19:01:26.783149 22665 dfly_main.cc:184] Found 234.06GiB available memory. Setting maxmemory to 187.25GiB
I20220609 19:01:26.783819 22666 proactor.cc:456] IORing with 1024 entries, allocated 102720 bytes, cq_entries is 2048
I20220609 19:01:26.787039 22665 proactor_pool.cc:66] Running 30 io threads
I20220609 19:01:26.797847 22665 server_family.cc:198] Data directory is "/home/jchome/dragonfly/build-opt"
I20220609 19:01:26.797976 22665 server_family.cc:122] Checking "/home/jchome/dragonfly/build-opt/dump"
I20220609 19:01:26.798053 22669 listener_interface.cc:79] sock[96] AcceptServer - listening on port 6379
KeyDB:
jchome@keydb-worker:~/KeyDB/src$ ./keydb-server --server-threads 4 --maxmemory 188G --port 6379 --protected-mode no
97236:97236:C 10 Jun 2022 02:00:30.422 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
97236:97236:C 10 Jun 2022 02:00:30.422 # KeyDB version=255.255.255, bits=64, commit=aa032d30, modified=0, pid=97236, just started
97236:97236:C 10 Jun 2022 02:00:30.422 # Configuration loaded
97236:97236:M 10 Jun 2022 02:00:30.423 * Increased maximum number of open files to 10032 (it was originally set to 1024).
97236:97236:M 10 Jun 2022 02:00:30.423 * monotonic clock: POSIX clock_gettime
_
_-(+)-_
_-- / \ --_
_-- / \ --_ KeyDB 255.255.255 (aa032d30/0) 64 bit
__-- / \ --__
(+) _ / \ _ (+) Running in standalone mode
| -- / \ -- | Port: 6379
| /--_ _ _--\ | PID: 97236
| / -(+)- \ |
| / | \ | https://docs.keydb.dev
| / | \ |
| / | \ |
(+)_ -- -- -- | -- -- -- _(+)
--_ | _--
--_ | _--
-(+)- KeyDB has now joined Snap! See the announcement at: https://docs.keydb.dev/news
97236:97236:M 10 Jun 2022 02:00:30.424 # Server initialized
97236:97236:M 10 Jun 2022 02:00:30.424 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
97236:97236:M 10 Jun 2022 02:00:30.424 * Loading RDB produced by version 255.255.255
97236:97236:M 10 Jun 2022 02:00:30.424 * RDB age 11 seconds
97236:97236:M 10 Jun 2022 02:00:30.424 * RDB memory usage when created 2.97 Mb
97236:97236:M 10 Jun 2022 02:00:30.424 # Done loading RDB, keys loaded: 0, keys expired: 0.
97236:97236:M 10 Jun 2022 02:00:30.424 * DB loaded from disk: 0.000 seconds
97236:97249:M 10 Jun 2022 02:00:30.424 * Thread 0 alive.
97236:97250:M 10 Jun 2022 02:00:30.424 * Thread 1 alive.
97236:97251:M 10 Jun 2022 02:00:30.424 * Thread 2 alive.
97236:97252:M 10 Jun 2022 02:00:30.424 * Thread 3 alive.
jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs] 0 threads: 60000000 ops, 1283453 (avg: 570846) ops/sec, 420.91MB/sec (avg: 187.21MB/sec), 0.23 (avg: 0.52) msec latency
[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 106 secs] 0 threads: 60000000 ops, 1055065 (avg: 565307) ops/sec, 346.00MB/sec (avg: 185.39MB/sec), 0.28 (avg: 0.53) msec latency
30 Threads
10 Connections per thread
200000 Requests per client
BEST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 592927.98 --- --- 0.52517 0.48700 1.27100 1.84700 199116.64
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 592927.98 0.00 0.00 0.52517 0.48700 1.27100 1.84700 199116.64
WORST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 567543.49 --- --- 0.53062 0.48700 1.42300 2.67100 190592.01
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 567543.49 0.00 0.00 0.53062 0.48700 1.42300 2.67100 190592.01
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 580235.73 --- --- 0.52789 0.48700 1.34300 2.36700 194854.33
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 580235.73 0.00 0.00 0.52789 0.48700 1.34300 2.36700 194854.33
jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs] 0 threads: 60000000 ops, 1076305 (avg: 569751) ops/sec, 347.84MB/sec (avg: 184.13MB/sec), 0.28 (avg: 0.53) msec latency
[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 102 secs] 0 threads: 60000000 ops, 942359 (avg: 584836) ops/sec, 304.55MB/sec (avg: 189.01MB/sec), 0.32 (avg: 0.51) msec latency
30 Threads
10 Connections per thread
200000 Requests per client
BEST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 593178.84 593178.84 0.00 0.51283 0.47900 1.25500 1.79900 196304.47
Waits 0.00 --- --- --- --- --- --- ---
Totals 593178.84 593178.84 0.00 0.51283 0.47900 1.25500 1.79900 196304.47
WORST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 577643.93 577643.93 0.00 0.52607 0.47900 1.31100 5.47100 191163.44
Waits 0.00 --- --- --- --- --- --- ---
Totals 577643.93 577643.93 0.00 0.52607 0.47900 1.31100 5.47100 191163.44
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 585411.39 585411.39 0.00 0.51945 0.47900 1.27900 2.71900 193733.96
Waits 0.00 --- --- --- --- --- --- ---
Totals 585411.39 585411.39 0.00 0.51945 0.47900 1.27900 2.71900 193733.96
jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs] 0 threads: 60000000 ops, 992211 (avg: 570415) ops/sec, 321.85MB/sec (avg: 185.03MB/sec), 0.30 (avg: 0.52) msec latency
[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 105 secs] 0 threads: 60000000 ops, 1346899 (avg: 570342) ops/sec, 436.90MB/sec (avg: 185.00MB/sec), 0.22 (avg: 0.53) msec latency
30 Threads
10 Connections per thread
200000 Requests per client
BEST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 145404.28 --- --- 0.52710 0.49500 1.24700 1.80700 48829.56
Gets 436212.84 436212.84 0.00 0.52488 0.48700 1.24700 1.80700 144358.73
Waits 0.00 --- --- --- --- --- --- ---
Totals 581617.12 436212.84 0.00 0.52543 0.48700 1.24700 1.80700 193188.29
WORST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 143679.31 --- --- 0.52817 0.49500 1.24700 1.67100 48250.26
Gets 431037.93 431037.93 0.00 0.52511 0.48700 1.24700 1.66300 142646.15
Waits 0.00 --- --- --- --- --- --- ---
Totals 574717.24 431037.93 0.00 0.52587 0.48700 1.24700 1.66300 190896.42
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 144541.79 --- --- 0.52764 0.49500 1.24700 1.71900 48539.91
Gets 433625.38 433625.38 0.00 0.52499 0.48700 1.24700 1.71900 143502.44
Waits 0.00 --- --- --- --- --- --- ---
Totals 578167.18 433625.38 0.00 0.52565 0.48700 1.24700 1.71900 192042.35
jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && echo "info memory" | nc $DRAGONFLY_SERVER 6379
$462
# Memory
used_memory:3047981240
used_memory_human:2.84GiB
used_memory_peak:3047981240
comitted_memory:3894657024
used_memory_rss:3181318144
used_memory_rss_human:2.96GiB
object_used_memory:2559986176
table_used_memory:480213552
num_buckets:12472320
num_entries:9999947
inline_keys:9999947
strval_bytes:2559986176
listpack_blobs:0
listpack_bytes:0
small_string_bytes:2559986176
maxmemory:201405674291
maxmemory_human:187.57GiB
cache_mode:store
jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 211 secs] 0 threads: 60000000 ops, 696657 (avg: 283109) ops/sec, 228.47MB/sec (avg: 92.85MB/sec), 0.43 (avg: 1.06) msec latency
[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 203 secs] 0 threads: 60000000 ops, 605828 (avg: 295104) ops/sec, 198.68MB/sec (avg: 96.78MB/sec), 0.49 (avg: 1.02) msec latency
30 Threads
10 Connections per thread
200000 Requests per client
BEST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 293413.30 --- --- 1.01653 0.83100 2.38300 6.71900 98533.82
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 293413.30 0.00 0.00 1.01653 0.83100 2.38300 6.71900 98533.82
WORST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 283447.54 --- --- 1.05917 0.94300 3.02300 7.58300 95187.15
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 283447.54 0.00 0.00 1.05917 0.94300 3.02300 7.58300 95187.15
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 288430.42 --- --- 1.03785 0.89500 2.71900 7.16700 96860.48
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 288430.42 0.00 0.00 1.03785 0.89500 2.71900 7.16700 96860.48
jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 159 secs] 0 threads: 60000000 ops, 924097 (avg: 376554) ops/sec, 298.65MB/sec (avg: 121.69MB/sec), 0.32 (avg: 0.80) msec latency
[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 164 secs] 0 threads: 60000000 ops, 683873 (avg: 364556) ops/sec, 221.01MB/sec (avg: 117.82MB/sec), 0.44 (avg: 0.82) msec latency
30 Threads
10 Connections per thread
200000 Requests per client
BEST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 388569.90 388569.90 0.00 0.79598 0.66300 1.47100 2.67100 128591.95
Waits 0.00 --- --- --- --- --- --- ---
Totals 388569.90 388569.90 0.00 0.79598 0.66300 1.47100 2.67100 128591.95
WORST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 371916.73 371916.73 0.00 0.82274 0.70300 1.50300 1.91100 123080.79
Waits 0.00 --- --- --- --- --- --- ---
Totals 371916.73 371916.73 0.00 0.82274 0.70300 1.50300 1.91100 123080.79
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 380243.32 380243.32 0.00 0.80936 0.68700 1.48700 2.03100 125836.37
Waits 0.00 --- --- --- --- --- --- ---
Totals 380243.32 380243.32 0.00 0.80936 0.68700 1.48700 2.03100 125836.37
jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 195 secs] 0 threads: 60000000 ops, 531805 (avg: 307101) ops/sec, 172.50MB/sec (avg: 99.62MB/sec), 0.56 (avg: 0.98) msec latency
[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 188 secs] 0 threads: 60000000 ops, 744913 (avg: 319107) ops/sec, 241.63MB/sec (avg: 103.51MB/sec), 0.40 (avg: 0.94) msec latency
30 Threads
10 Connections per thread
200000 Requests per client
BEST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 82518.16 --- --- 0.95311 0.80700 1.88700 5.69500 27711.18
Gets 247554.48 247554.48 0.00 0.93559 0.79100 1.87100 5.59900 81924.79
Waits 0.00 --- --- --- --- --- --- ---
Totals 330072.63 247554.48 0.00 0.93997 0.79100 1.87100 5.63100 109635.97
WORST RUN RESULTS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 78893.16 --- --- 0.99005 0.83100 2.09500 6.30300 26493.84
Gets 236679.48 236679.48 0.00 0.97173 0.81500 2.07900 6.27100 78325.87
Waits 0.00 --- --- --- --- --- --- ---
Totals 315572.64 236679.48 0.00 0.97631 0.82300 2.07900 6.27100 104819.71
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 80705.66 --- --- 0.97158 0.82300 2.00700 6.07900 27102.51
Gets 242116.98 242116.98 0.00 0.95366 0.80700 1.98300 6.01500 80125.33
Waits 0.00 --- --- --- --- --- --- ---
Totals 322822.64 242116.98 0.00 0.95814 0.80700 1.99100 6.01500 107227.84
jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && echo "info memory" | nc $KEYDB_SERVER 6379
$1190
# Memory
used_memory:3977378208
used_memory_human:3.70G
used_memory_rss:5380833280
used_memory_rss_human:5.01G
used_memory_peak:5496877432
used_memory_peak_human:5.12G
used_memory_peak_perc:72.36%
used_memory_overhead:537329080
used_memory_startup:3113504
used_memory_dataset:3440049128
used_memory_dataset_perc:86.56%
allocator_allocated:3977771688
allocator_active:5311553536
allocator_resident:5373931520
total_system_memory:253563305984
total_system_memory_human:236.15G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:188000000000
maxmemory_human:175.09G
maxmemory_policy:noeviction
allocator_frag_ratio:1.34
allocator_frag_bytes:1333781848
allocator_rss_ratio:1.01
allocator_rss_bytes:62377984
rss_overhead_ratio:1.00
rss_overhead_bytes:6901760
mem_fragmentation_ratio:1.35
mem_fragmentation_bytes:1403517104
mem_not_counted_for_evict:1048576
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:none
@drinkbeer , do not expect to reach anywhere close to 3.8M qps on GCP. AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput. I will benchmark GCP and get back to you.
You provided a great reference point with your results! It will take me a week or so. Hope it's ok.
AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput.
If you take a look at the two dashboards, you can find that we are not even close to saturate the network I think. But I can check with google guys if our tests start to drop packets.
I will benchmark GCP and get back to you.
Thank you so much! I will use the results as a benchmark, continue testing with Docker and Kubernetes. Hoping that we can achieve similar results in Docker and Kubernetes (I guess Docker and Kubernetes will introduce overhead, but I am curious how many overhead it is).
It will take me a week or so. Hope it's ok.
It is totally fine. I am really appreciate your time and looking forward to your benchmarking in GCP.
AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput.
If you take a look at the two dashboards, you can find that we are not even close to saturate the network I think. But I can check with google guys if our tests start to drop packets.
Yeah, it's not close to saturating the bandwidth. Throughput is another matter and is a bit more complicated.
I will benchmark GCP and get back to you.
Thank you so much! I will use the results as a benchmark, continue testing with Docker and Kubernetes. Hoping that we can achieve similar results in Docker and Kubernetes (I guess Docker and Kubernetes will introduce overhead, but I am curious how many overhead it is).
It will take me a week or so. Hope it's ok.
It is totally fine. I am really appreciate your time and looking forward to your benchmarking in GCP.
Thank you so much for your reply. I think your suggestion of testing with GCP instance is great. I will follow the steps and try to get ideal P99 latency. I will update the results here once I finish the tests.
Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.
No. They are running on two different nodes in the same cluster, so same region (
us-east1
), and same zone (us-east1-d
). Memtier jobs are also running on the same nodes (memtier job for keydb runs in the same node as keydb cache; memtier job for dragonfly runs in the same node as dragonfly cache).And two memtier jobs are running sequentially to avoid saturating the network.
I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...
I first runs
pure set
memtier job that does only set fromkey:1
tokey:10000000
to test set operations only; then I runpure get
memtier job which has 100% hit rate as all the keys in the key space are filled; then I runmixed-set-get
memtier job which has 25% of set and 75% of get, and the hit rate is 100% as well (Check theMisses/sec
metric, they are 0 which means hit rate is 100%). I think the--key-pattern=S:R
only affects the third memtier job (mixed ops one). Because hit rate is 100%, I think thekey-pattern
doesn't affect our performance a lot. But I could try--distinct-client-seed
option in my test in GCP instance.
Hey, dont use high performance apps inside Kubernetes or docker or aws or azure cloud.
Pay in skilled admin with performance and security focus and inest money in bare metal server!!!
Even my 9 years old notebook did more requests - laboratory notebook.
@drinkbeer preliminary results... took 2 machines c2-60 as you did
fetched DF binary v0.2.0 from https://github.com/dragonflydb/dragonfly/releases/download/v0.2.0/dragonfly-x86_64.unstripped.tar.gz
ev@test-c1:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
dev@test-c1:~$ uname -a
Linux test-c1 5.15.0-1008-gcp #12-Ubuntu SMP Wed Jun 1 21:29:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Discloser: it's my development image created via a packer pipeline defined here: https://github.com/romange/image-bakery
After scanning it now, I think that the only substantial change performance-wise that I did there - is turning off mitigations:
sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
besides this - it's just convenience configs and utilities.
I run only the first SET benchmark - I copy-pasted your command:
DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1395818.80 --- --- 0.23205 0.23100 0.40700 0.55100 468742.82
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1395818.80 0.00 0.00 0.23205 0.23100 0.40700 0.55100 468742.82
CPU usage of dragonfly:
Already much better than your result. Lets try improving it.
Rerun dragonfly with:
./dragonfly-x86_64 --logbuflevel=-1 --logtostderr --conn_use_incoming_cpu
(note the last flag).
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1420922.15 --- --- 0.22131 0.21500 0.38300 0.50300 477173.01
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1420922.15 0.00 0.00 0.22131 0.21500 0.38300 0.50300 477173.01
but now the CPU usage is:
much lower than before (4580% vs 3360%). Also p99 pretty good for both cases. Now lets increase the load a bit by increasing the number of clients in memtier to 30:
DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=30 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1977947.42 --- --- 0.46646 0.43900 1.07100 1.44700 664232.85
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1977947.42 0.00 0.00 0.46646 0.43900 1.07100 1.44700 664232.85
p99.9 is too high IMHO. Lets take it down a notch: clients=10, threads=60
:
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1677309.01 --- --- 0.35636 0.33500 0.71100 1.01500 563272.64
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1677309.01 0.00 0.00 0.35636 0.33500 0.71100 1.01500 563272.64
pretty good - p99.9 1ms under with 1.6M QPS.
Now I see you used 1 vCPU per core ratio. I use the regular 2 vCPU / core
Step2: I took a plain Ubuntu image 22.04.
the only thing I did before running DF was invoking ulimit -n 20000
and then ./dragonfly-x86_64 --logtostderr
client (loadtest) instance: took n2-custom-80-40960
just to be on the safe side so that we won't have bottlenecks there.
I do not think it matters substantially.
DRAGONFLY_SERVER="10.142.0.20" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 45 secs] 0 threads: 75000000 ops, 2052410 (avg: 1651450) ops/sec, 673.08MB/sec (avg: 541.59MB/sec), 0.36 (avg: 0.45) msec latency
50 Threads
15 Connections per thread
100000 Requests per client
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1788894.22 --- --- 0.45224 0.41500 0.84700 1.51900 600745.16
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1788894.22 0.00 0.00 0.45224 0.41500 0.84700 1.51900 600745.16
Seems that DF works ok on Ubuntu 22.04 out of the box. Next step - to check debian.
Step 3: used BullsEye - projects/debian-cloud/global/images/debian-11-bullseye-v20220519
dragonfly: https://github.com/dragonflydb/dragonfly/releases/download/v0.2.0/dragonfly-x86_64.unstripped.tar.gz
Everything else like before. As you can see - I can confirm that Debian 11 is very bad performance-wise. I suspect that it's because to reach better performance you need at least 5.11 but I am not sure. In any case Ubuntu provides a simple alternative if performance is what you need.
Jianbin, I think there are enough data points here to continue evaluating DF.
dev@test-c1:~$ DRAGONFLY_SERVER="10.142.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 184 secs] 0 threads: 75000000 ops, 435703 (avg: 406529) ops/sec, 142.89MB/sec (avg: 133.32MB/sec), 1.72 (avg: 1.84) msec latency
50 Threads
15 Connections per thread
100000 Requests per client
ALL STATS
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 432159.18 --- --- 1.84153 1.45500 7.32700 18.43100 145127.38
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 432159.18 0.00 0.00 1.84153 1.45500 7.32700 18.43100 145127.38
@drinkbeer hey man, did you have a chance to experiment with it?
@drinkbeer I am closing. Feel free to reopen if you have any questions
Thank you @romange , this issue can be closed. The next step, we will probably build Dragonfly in our staging environment, and benchmark it along with Envoy proxy (which is the proxy used with KeyDB in our Prod).
Here are some results of our benchmarking. The performance of Dragonfly looks great.
(updated at July 4th, 2022)
We deployed Dragonfly, KeyDB on c2-standard-60
machines (30 cores, 240 GB RAM) with Ubuntu 22.04
out of box. We used memtier from Redis community for load generating and benchmarking. We tested performance and resource usage with all Set
operations, all Get
operations, and Set-Get Mixed
operations. The conclusion is Dragonfly can achieve much higher throughput (3.5X) as well as much lower latency (14%) than KeyDB. The resource usage of Dragonfly is also impressive. Dragonfly can fully & averagely utilize the CPU multiple cores, while KeyDB cannot have more than 16 server-threads
, which means it cannot fully utilize the 30 cores CPU in the machine; Dragonfly also utilizes less memory (76.19%) than KeyDB. One thing to notice about KeyDB is adding more threads does not help with performance and resource utilization.
Dragonfly | KeyDB (4 threads) | KeyDB (16 threads) | Dragonfly (Docker) | |
---|---|---|---|---|
Set Latency P99.9 (ms) | 0.52700 | 8.63900 | 21.37500 | 0.93500 |
Get Latency P99.9 (ms) | 0.54300 | 1.60700 | 1.56700 | 0.59900 |
Set-Get Mixed Latency P99.9 (ms) | 0.57500 | 4.35100 | 7.03900 | 0.60700 |
Throughput (ops/s) | ~1.4Million | ~400K | ~307K | ~1.25Million |
Memory (GB) | 3.68 | 4.83 | 6.25 | 3.86 |
CPU (number of Cores) | 22.8 | 4.25 | 15.23 | 27.97 |
Hardward
Dragonfly
jchome@dragonfly-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@dragonfly-worker-ubuntu:~$ uname -a
Linux dragonfly-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@dragonfly-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@dragonfly-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"
cd ~ && \
wget https://github.com/dragonflydb/dragonfly/releases/download/v0.3.1/dragonfly-x86_64.unstripped.tar.gz && \
tar -xvf dragonfly-x86_64.unstripped.tar.gz
jchome@dragonfly-worker-ubuntu:~$ ./dragonfly-x86_64 --logbuflevel=-1 --logtostderr --conn_use_incoming_cpu
KeyDB
jchome@keydb-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@keydb-worker-ubuntu:~$ uname -a
Linux keydb-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@keydb-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@keydb-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"
sudo apt-get update
sudo apt-get install build-essential nasm autotools-dev autoconf libjemalloc-dev tcl tcl-dev uuid-dev libcurl4-openssl-dev git
git clone https://github.com/EQ-Alpha/KeyDB.git
cd KeyDB
make distclean
make test
make
sudo make install
jchome@keydb-worker-ubuntu:~/KeyDB/src$ ./keydb-server --server-threads 4 --maxmemory 188G --port 6379 --protected-mode no
jchome@keydb-worker-ubuntu:~/KeyDB/src$ ./keydb-server --server-threads 16 --maxmemory 188G --port 6379 --protected-mode no &
[1] 102412
Memtier
jchome@memtier-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@memtier-worker-ubuntu:~$ uname -a
Linux memtier-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@memtier-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@memtier-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"
git clone https://github.com/RedisLabs/memtier_benchmark.git & cd memtier_benchmark/
https://github.com/RedisLabs/memtier_benchmark#building-and-installing
DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1423048.66 --- --- 0.22339 0.22300 0.39100 0.52700 477887.13
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1423048.66 0.00 0.00 0.22339 0.22300 0.39100 0.52700 477887.13
DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 1376543.56 1376543.56 0.00 0.22864 0.22300 0.39900 0.54300 455548.42
Waits 0.00 --- --- --- --- --- --- ---
Totals 1376543.56 1376543.56 0.00 0.22864 0.22300 0.39900 0.54300 455548.42
DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 328923.70 --- --- 0.23965 0.23900 0.42300 0.57500 110458.90
Gets 986771.11 986771.11 0.00 0.23785 0.23100 0.42300 0.57500 326558.53
Waits 0.00 --- --- --- --- --- --- ---
Totals 1315694.82 986771.11 0.00 0.23830 0.23100 0.42300 0.57500 437017.42
DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 1305596.15 --- --- 0.23675 0.23100 0.50300 0.93500 438444.31
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 1305596.15 0.00 0.00 0.23675 0.23100 0.50300 0.93500 438444.31
DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 1316826.64 1316826.64 0.00 0.23503 0.23100 0.41500 0.59900 435785.91
Waits 0.00 --- --- --- --- --- --- ---
Totals 1316826.64 1316826.64 0.00 0.23503 0.23100 0.41500 0.59900 435785.91
DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 313124.70 --- --- 0.24759 0.24700 0.43100 0.60700 105153.29
Gets 939374.09 939374.09 0.00 0.24553 0.23900 0.43100 0.59900 310873.12
Waits 0.00 --- --- --- --- --- --- ---
Totals 1252498.79 939374.09 0.00 0.24604 0.23900 0.43100 0.60700 416026.41
KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 352644.13 --- --- 0.87533 0.64700 3.71100 8.63900 118424.68
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 352644.13 0.00 0.00 0.87533 0.64700 3.71100 8.63900 118424.68
KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 462801.59 462801.59 0.00 0.66506 0.54300 1.16700 1.60700 153157.91
Waits 0.00 --- --- --- --- --- --- ---
Totals 462801.59 462801.59 0.00 0.66506 0.54300 1.16700 1.60700 153157.91
KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 94887.31 --- --- 0.80495 0.63900 2.33500 4.41500 31864.98
Gets 284661.92 284661.92 0.00 0.79218 0.61500 2.28700 4.35100 94205.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 379549.23 284661.92 0.00 0.79537 0.61500 2.30300 4.35100 126069.98
KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 262139.40 --- --- 1.14587 0.86300 8.83100 21.37500 88031.46
Gets 0.00 0.00 0.00 --- --- --- --- 0.00
Waits 0.00 --- --- --- --- --- --- ---
Totals 262139.40 0.00 0.00 1.14587 0.86300 8.83100 21.37500 88031.46
KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 0.00 --- --- --- --- --- --- 0.00
Gets 399851.71 399851.71 0.00 0.75064 0.74300 1.05500 1.56700 132325.50
Waits 0.00 --- --- --- --- --- --- ---
Totals 399851.71 399851.71 0.00 0.75064 0.74300 1.05500 1.56700 132325.50
KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets 76642.55 --- --- 0.98195 0.78300 4.86300 7.13500 25738.04
Gets 229927.65 229927.65 0.00 0.97991 0.78300 4.79900 7.03900 76091.44
Waits 0.00 --- --- --- --- --- --- ---
Totals 306570.19 229927.65 0.00 0.98042 0.78300 4.79900 7.03900 101829.48
@drinkbeer These are fantastic results! It really makes me happy 🕺🏼 to see that Dragonfly provides value! Jianbin, I would like to have a quick chat with you on discord or google meet. Will it be possible?
I would like to have a quick chat with you on discord or google meet. Will it be possible?
I would love to. I sent you an invitation through your LinkedIn. Let's chat.
Hey, Dragonfly maintainers,
Thank you for your great work on this fantastic project. My teammates and I are impressed by the benchmark results and are trying to reproduce the benchmarking in Kubernetes (the reason we want to benchmark it in Kubernetes is we use k8s in our production environment).
I followed the set up in the readme and dashtable doc. I found my benchmarking result is not as good as you guys did, so I would like to publish my benchmarking results here, and hear the suggestions from all of you on how to improve the performance.
Any feedback are greatly appreciated. Thank you!
Test Environment Setup
Node:
v1.22.9-gke.1500
5.10.109+
)Dragonfly pod:
docker.dragonflydb.io/dragonflydb/dragonfly
Dragonfly info:
Dragonfly yaml file:
Keydb pod:
KeyDB info
We are using an internal version of KeyDB. KeyDB yaml file:
The memtier_benchmark job for Dragonfly:
The memtier_benchmark job for keydb:
Test Result
Here are the results of the tests.
I am impressed by the memory utilization of Dragonfly. Dragonfly uses only (31.19/117.3*100=) 26.59% of memory in KeyDB. Dragonfly also has better
Get
performance (higher throughput, lower latency). But KeyDB performs better inSet
throughput and latency. In the mixed-set-get case, KeyDB also has better throughput, and latency.Pure Set
VECache (KeyDB)
[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 404 secs] 0 threads: 10000000 ops, 31802 (avg: 24697) ops/sec, 10.43MB/sec (avg: 8.10MB/sec), 7.78 (avg: 10.07) msec latency
5 Threads 10 Connections per thread 200000 Requests per client
BEST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 25378.16 --- --- 10.07100 8522.48 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 25378.16 0.00 0.00 10.07100 8522.48
WORST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 25114.38 --- --- 10.08300 8433.89 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 25114.38 0.00 0.00 10.08300 8433.89
AGGREGATED AVERAGE RESULTS (2 runs)
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 25246.27 --- --- 10.07700 8478.19 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 25246.27 0.00 0.00 10.07700 8478.19
➜ Documents k logs memtier-dragonfly-j9l8m [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 519 secs] 0 threads: 9999996 ops, 23093 (avg: 19231) ops/sec, 7.58MB/sec (avg: 6.31MB/sec), 10.80 (avg: 12.96) msec latency
[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 522 secs] 0 threads: 10000000 ops, 21434 (avg: 19131) ops/sec, 7.03MB/sec (avg: 6.27MB/sec), 11.61 (avg: 13.03) msec latency
5 Threads 10 Connections per thread 200000 Requests per client
BEST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 19647.63 --- --- 12.96000 6598.05 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 19647.63 0.00 0.00 12.96000 6598.05
WORST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 18968.02 --- --- 13.02600 6369.83 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 18968.02 0.00 0.00 13.02600 6369.83
AGGREGATED AVERAGE RESULTS (2 runs)
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 19307.83 --- --- 12.99300 6483.94 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 19307.83 0.00 0.00 12.99300 6483.94
➜ Documents k logs memtier-vecache-xh6xj [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 400 secs] 0 threads: 10000000 ops, 46369 (avg: 24938) ops/sec, 14.94MB/sec (avg: 8.03MB/sec), 5.37 (avg: 9.97) msec latency
[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 398 secs] 0 threads: 9999999 ops, 57272 (avg: 25116) ops/sec, 18.45MB/sec (avg: 8.09MB/sec), 4.36 (avg: 9.90) msec latency
5 Threads 10 Connections per thread 200000 Requests per client
BEST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 0.00 --- --- 0.00000 0.00 Gets 25426.75 25426.75 0.00 9.90000 8387.60 Waits 0.00 --- --- 0.00000 --- Totals 25426.75 25426.75 0.00 9.90000 8387.60
WORST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 0.00 --- --- 0.00000 0.00 Gets 25288.15 25288.15 0.00 9.97100 8341.88 Waits 0.00 --- --- 0.00000 --- Totals 25288.15 25288.15 0.00 9.97100 8341.88
AGGREGATED AVERAGE RESULTS (2 runs)
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 0.00 --- --- 0.00000 0.00 Gets 25357.45 25357.45 0.00 9.93550 8364.74 Waits 0.00 --- --- 0.00000 --- Totals 25357.45 25357.45 0.00 9.93550 8364.74
➜ Documents k logs memtier-dragonfly-5kzsm [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 365 secs] 0 threads: 9999999 ops, 83523 (avg: 27366) ops/sec, 26.91MB/sec (avg: 8.82MB/sec), 2.77 (avg: 9.11) msec latency
[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 363 secs] 0 threads: 9999999 ops, 84975 (avg: 27502) ops/sec, 27.37MB/sec (avg: 8.86MB/sec), 2.69 (avg: 9.07) msec latency
5 Threads 10 Connections per thread 200000 Requests per client
BEST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 0.00 --- --- 0.00000 0.00 Gets 27705.71 27705.71 0.00 9.11100 9139.37 Waits 0.00 --- --- 0.00000 --- Totals 27705.71 27705.71 0.00 9.11100 9139.37
WORST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 0.00 --- --- 0.00000 0.00 Gets 0.00 0.00 0.00 9.06700 0.00 Waits 0.00 --- --- 0.00000 --- Totals 0.00 0.00 0.00 9.06700 0.00
AGGREGATED AVERAGE RESULTS (2 runs)
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 0.00 --- --- 0.00000 0.00 Gets 13852.86 13852.86 0.00 9.08900 4569.68 Waits 0.00 --- --- 0.00000 --- Totals 13852.86 13852.86 0.00 9.08900 4569.68
➜ Documents k logs memtier-vecache-qvgq6 [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 327 secs] 0 threads: 10000000 ops, 36385 (avg: 30555) ops/sec, 11.77MB/sec (avg: 9.88MB/sec), 6.85 (avg: 8.13) msec latency
[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 330 secs] 0 threads: 10000000 ops, 34522 (avg: 30286) ops/sec, 11.16MB/sec (avg: 9.79MB/sec), 7.22 (avg: 8.20) msec latency
5 Threads 10 Connections per thread 200000 Requests per client
BEST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 8009.01 --- --- 8.13400 2681.06 Gets 24027.04 24027.04 0.00 8.12800 7925.85 Waits 0.00 --- --- 0.00000 --- Totals 32036.06 24027.04 0.00 8.12900 10606.91
WORST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 7859.33 --- --- 8.20400 2630.95 Gets 23578.00 23578.00 0.00 8.20000 7777.73 Waits 0.00 --- --- 0.00000 --- Totals 31437.33 23578.00 0.00 8.20100 10408.68
AGGREGATED AVERAGE RESULTS (2 runs)
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 7934.17 --- --- 8.16900 2656.01 Gets 23802.52 23802.52 0.00 8.16400 7851.79 Waits 0.00 --- --- 0.00000 --- Totals 31736.69 23802.52 0.00 8.16500 10507.80
➜ Documents k logs memtier-dragonfly-ws9rr [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 425 secs] 0 threads: 10000000 ops, 25440 (avg: 23479) ops/sec, 8.23MB/sec (avg: 7.59MB/sec), 10.22 (avg: 10.60) msec latency
[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 430 secs] 0 threads: 10000000 ops, 31976 (avg: 23230) ops/sec, 10.34MB/sec (avg: 7.51MB/sec), 7.79 (avg: 10.71) msec latency
5 Threads 10 Connections per thread 200000 Requests per client
BEST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 5922.85 --- --- 10.61200 1982.71 Gets 17768.56 17768.56 0.00 10.59100 5861.35 Waits 0.00 --- --- 0.00000 --- Totals 23691.41 17768.56 0.00 10.59600 7844.06
WORST RUN RESULTS
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 5799.37 --- --- 10.71700 1941.37 Gets 17398.11 17398.11 0.00 10.70500 5739.15 Waits 0.00 --- --- 0.00000 --- Totals 23197.48 17398.11 0.00 10.70800 7680.52
AGGREGATED AVERAGE RESULTS (2 runs)
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
Sets 5861.11 --- --- 10.66450 1962.04 Gets 17583.33 17583.33 0.00 10.64800 5800.25 Waits 0.00 --- --- 0.00000 --- Totals 23444.44 17583.33 0.00 10.65200 7762.29