dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.83k stars 948 forks source link

Seek for help on benchmarking of Dragonfly and KeyDB in Kubernetes #113

Closed drinkbeer closed 2 years ago

drinkbeer commented 2 years ago

Hey, Dragonfly maintainers,

Thank you for your great work on this fantastic project. My teammates and I are impressed by the benchmark results and are trying to reproduce the benchmarking in Kubernetes (the reason we want to benchmark it in Kubernetes is we use k8s in our production environment).

I followed the set up in the readme and dashtable doc. I found my benchmarking result is not as good as you guys did, so I would like to publish my benchmarking results here, and hear the suggestions from all of you on how to improve the performance.

Any feedback are greatly appreciated. Thank you!

Test Environment Setup

Node:

Dragonfly pod:

Dragonfly info:

# Server
redis_version:df-0.1
redis_mode:standalone
arch_bits:64
multiplexing_api:iouring
tcp_port:6379
uptime_in_seconds:301378
uptime_in_days:3

# Clients
connected_clients:1
client_read_buf_capacity:256
blocked_clients:0

Dragonfly yaml file:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: dragonfly
    type: test
  name: dragonfly
  namespace: jason-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dragonfly
      name: dragonfly
  serviceName: dragonfly
  template:
    metadata:
      annotations:
        ad.datadoghq.com/redis.check_names: '["redisdb"]'
        ad.datadoghq.com/redis.init_configs: '[{}]'
      labels:
        app: dragonfly
        name: dragonfly
    spec:
      automountServiceAccountToken: false
      containers:
      - args:
        image: docker.dragonflydb.io/dragonflydb/dragonfly
        command: ["/bin/sh"]
        args: ["-c", "ulimit -l unlimited && dragonfly --logtostderr"]
        imagePullPolicy: IfNotPresent
        name: redis
        ports:
        - containerPort: 6379
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 3
          successThreshold: 3
          tcpSocket:
            port: 6379
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "8"
            memory: 16000Mi
          requests:
            cpu: "8"
            memory: 16000Mi
        securityContext:
          capabilities:
            drop:
            - AUDIT_WRITE
            - CHOWN
            - DAC_OVERRIDE
            - FOWNER
            - FSETID
            - KILL
            - MKNOD
            - NET_BIND_SERVICE
            - SETGID
            - SETFCAP
            - SETPCAP
            - SETUID
            - SYS_CHROOT
          privileged: true
          runAsNonRoot: false
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - |
          sysctl -w net.core.somaxconn=8192;sysctl -w net.ipv4.tcp_max_syn_backlog=8192;echo never > /host-sys/kernel/mm/transparent_hugepage/enabled;echo never > /host-sys/kernel/mm/transparent_hugepage/defrag
        image: gcr.io/shopify-docker-images/cloud/busybox:1.0
        imagePullPolicy: IfNotPresent
        name: system-init
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 1Gi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host-sys
          name: host-sys
      nodeSelector:
        role: vecache
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 1
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      volumes:
      - hostPath:
          path: /sys
          type: ""
        name: host-sys
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    namebuddy.shopify.io/cname: dragonfly-0.jason-test.staging-cq-state-us-east1-1.test
    namebuddy.shopify.io/dns: ttl=5
  labels:
    app: dragonfly
    type: test
  name: dragonfly-0
  namespace: jason-test
spec:
  ports:
  - port: 6379
    protocol: TCP
    targetPort: 6379
  selector:
    statefulset.kubernetes.io/pod-name: dragonfly-0

Keydb pod:

KeyDB info

# Server
redis_version:6.0.16
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4d5c208d4d774a11
redis_mode:standalone
os:Linux 5.10.109+ x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:7.5.0
process_id:1
run_id:8d4e6f7ffe3a90f66aa4ce8f84155d463a74d546
tcp_port:6379
uptime_in_seconds:306392
uptime_in_days:3
hz:10
configured_hz:10
lru_clock:10415189
executable:/usr/local/bin/keydb-server
config_file:/etc/redis/redis.conf

# Clients
connected_clients:1
client_recent_max_input_buffer:4
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:1
thread_1_clients:0
thread_2_clients:0
thread_3_clients:0

We are using an internal version of KeyDB. KeyDB yaml file:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: vecache
    type: test
  name: vecache
  namespace: jason-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vecache
      name: vecache
  serviceName: vecache
  template:
    metadata:
      annotations:
        ad.datadoghq.com/redis.check_names: '["redisdb"]'
        ad.datadoghq.com/redis.init_configs: '[{}]'
      labels:
        app: vecache
        name: vecache
    spec:
      automountServiceAccountToken: false
      containers:
      - args:
        - --maxmemory 8000Mb
        - --server-threads 4
        image: gcr.io/shopify-docker-images/cloud/vecache:1.0-6.0.16
        imagePullPolicy: IfNotPresent
        name: redis
        ports:
        - containerPort: 6379
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 3
          successThreshold: 3
          tcpSocket:
            port: 6379
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "8"
            memory: 16000Mi
          requests:
            cpu: "8"
            memory: 16000Mi
        securityContext:
          capabilities:
            drop:
            - AUDIT_WRITE
            - CHOWN
            - DAC_OVERRIDE
            - FOWNER
            - FSETID
            - KILL
            - MKNOD
            - NET_BIND_SERVICE
            - SETGID
            - SETFCAP
            - SETPCAP
            - SETUID
            - SYS_CHROOT
          privileged: false
          runAsNonRoot: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - |
          sysctl -w net.core.somaxconn=8192;sysctl -w net.ipv4.tcp_max_syn_backlog=8192;echo never > /host-sys/kernel/mm/transparent_hugepage/enabled;echo never > /host-sys/kernel/mm/transparent_hugepage/defrag
        image: gcr.io/shopify-docker-images/cloud/busybox:1.0
        imagePullPolicy: IfNotPresent
        name: system-init
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 1Gi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host-sys
          name: host-sys
      nodeSelector:
        role: vecache
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 1
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
      - labelSelector:
          matchLabels:
            app: vecache
        maxSkew: 4
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      volumes:
      - hostPath:
          path: /sys
          type: ""
        name: host-sys
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    namebuddy.shopify.io/cname: vecache-0.jason-test.staging-cq-state-us-east1-1.test
    namebuddy.shopify.io/dns: ttl=5
  labels:
    app: vecache
    type: test
  name: vecache-0
  namespace: jason-test
spec:
  ports:
  - port: 6379
    protocol: TCP
    targetPort: 6379
  selector:
    statefulset.kubernetes.io/pod-name: vecache-0

The memtier_benchmark job for Dragonfly:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: memtier-dragonfly
  namespace: jason-test
spec:
  completions: 1
  parallelism: 1
  template:
    metadata:
      labels:
        app: memtier
    spec:
      containers:
      - name: memtier
        image: redislabs/memtier_benchmark
        args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:3"]
        resources:
          limits:
            cpu: "200m"
            memory: "250Mi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        env:
         - name: REDIS_PORT
           value: "6379"
         - name: REDIS_SERVER
           value: "dragonfly-0.jason-test.svc.cluster.local"  # can be full hostname or just the resource name in k8s
        imagePullPolicy: Always
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      restartPolicy: OnFailure

The memtier_benchmark job for keydb:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: memtier-vecache
  namespace: jason-test
spec:
  completions: 1
  parallelism: 1
  template:
    metadata:
      labels:
        app: memtier
    spec:
      containers:
      - name: memtier
        image: redislabs/memtier_benchmark
        args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:3"]
        resources:
          limits:
            cpu: "200m"
            memory: "250Mi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        env:
         - name: REDIS_PORT
           value: "6379"
         - name: REDIS_SERVER
           value: "vecache-0.jason-test.staging-cq-state-us-east1-1.test"  # can be full hostname or just the resource name in k8s
        imagePullPolicy: Always
      tolerations:
      - effect: NoExecute
        key: app
        operator: Equal
        value: vecache
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 30
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 30
      restartPolicy: OnFailure

Test Result

Here are the results of the tests.

I am impressed by the memory utilization of Dragonfly. Dragonfly uses only (31.19/117.3*100=) 26.59% of memory in KeyDB. Dragonfly also has better Get performance (higher throughput, lower latency). But KeyDB performs better in Set throughput and latency. In the mixed-set-get case, KeyDB also has better throughput, and latency.

image

Pure Set

    args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:0"]

VECache (KeyDB)

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 404 secs] 0 threads: 10000000 ops, 31802 (avg: 24697) ops/sec, 10.43MB/sec (avg: 8.10MB/sec), 7.78 (avg: 10.07) msec latency

5 Threads 10 Connections per thread 200000 Requests per client

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 25378.16 --- --- 10.07100 8522.48 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 25378.16 0.00 0.00 10.07100 8522.48

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 25114.38 --- --- 10.08300 8433.89 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 25114.38 0.00 0.00 10.08300 8433.89

AGGREGATED AVERAGE RESULTS (2 runs)

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 25246.27 --- --- 10.07700 8478.19 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 25246.27 0.00 0.00 10.07700 8478.19


Dragonfly
* Throughput: 19307.83 ops/sec, 6483.94 KB/sec
* Latency: 12.99300 ms
* used_memory_human:31.19MiB

➜ Documents k logs memtier-dragonfly-j9l8m [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 519 secs] 0 threads: 9999996 ops, 23093 (avg: 19231) ops/sec, 7.58MB/sec (avg: 6.31MB/sec), 10.80 (avg: 12.96) msec latency

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 522 secs] 0 threads: 10000000 ops, 21434 (avg: 19131) ops/sec, 7.03MB/sec (avg: 6.27MB/sec), 11.61 (avg: 13.03) msec latency

5 Threads 10 Connections per thread 200000 Requests per client

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 19647.63 --- --- 12.96000 6598.05 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 19647.63 0.00 0.00 12.96000 6598.05

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 18968.02 --- --- 13.02600 6369.83 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 18968.02 0.00 0.00 13.02600 6369.83

AGGREGATED AVERAGE RESULTS (2 runs)

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 19307.83 --- --- 12.99300 6483.94 Gets 0.00 0.00 0.00 0.00000 0.00 Waits 0.00 --- --- 0.00000 --- Totals 19307.83 0.00 0.00 12.99300 6483.94


## Pure Get

        args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=0:1"]

VECache (KeyDB)
* Throughput: 25357.45 ops/sec, 8364.74 KB/sec
* Latency: 9.93550 ms
* used_memory_human:117.30M

➜ Documents k logs memtier-vecache-xh6xj [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 400 secs] 0 threads: 10000000 ops, 46369 (avg: 24938) ops/sec, 14.94MB/sec (avg: 8.03MB/sec), 5.37 (avg: 9.97) msec latency

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 398 secs] 0 threads: 9999999 ops, 57272 (avg: 25116) ops/sec, 18.45MB/sec (avg: 8.09MB/sec), 4.36 (avg: 9.90) msec latency

5 Threads 10 Connections per thread 200000 Requests per client

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 0.00 --- --- 0.00000 0.00 Gets 25426.75 25426.75 0.00 9.90000 8387.60 Waits 0.00 --- --- 0.00000 --- Totals 25426.75 25426.75 0.00 9.90000 8387.60

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 0.00 --- --- 0.00000 0.00 Gets 25288.15 25288.15 0.00 9.97100 8341.88 Waits 0.00 --- --- 0.00000 --- Totals 25288.15 25288.15 0.00 9.97100 8341.88

AGGREGATED AVERAGE RESULTS (2 runs)

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 0.00 --- --- 0.00000 0.00 Gets 25357.45 25357.45 0.00 9.93550 8364.74 Waits 0.00 --- --- 0.00000 --- Totals 25357.45 25357.45 0.00 9.93550 8364.74


Dragonfly
* Throughput: 27705.71 ops/sec, 9139.37 KB/sec
* Latency: 9.11100 ms
* used_memory_human:31.19MiB

➜ Documents k logs memtier-dragonfly-5kzsm [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 365 secs] 0 threads: 9999999 ops, 83523 (avg: 27366) ops/sec, 26.91MB/sec (avg: 8.82MB/sec), 2.77 (avg: 9.11) msec latency

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 363 secs] 0 threads: 9999999 ops, 84975 (avg: 27502) ops/sec, 27.37MB/sec (avg: 8.86MB/sec), 2.69 (avg: 9.07) msec latency

5 Threads 10 Connections per thread 200000 Requests per client

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 0.00 --- --- 0.00000 0.00 Gets 27705.71 27705.71 0.00 9.11100 9139.37 Waits 0.00 --- --- 0.00000 --- Totals 27705.71 27705.71 0.00 9.11100 9139.37

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 0.00 --- --- 0.00000 0.00 Gets 0.00 0.00 0.00 9.06700 0.00 Waits 0.00 --- --- 0.00000 --- Totals 0.00 0.00 0.00 9.06700 0.00

AGGREGATED AVERAGE RESULTS (2 runs)

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 0.00 --- --- 0.00000 0.00 Gets 13852.86 13852.86 0.00 9.08900 4569.68 Waits 0.00 --- --- 0.00000 --- Totals 13852.86 13852.86 0.00 9.08900 4569.68


## Mixed Set-Get (1:3)

        args: ["-s", "$(REDIS_SERVER)", "-p", "$(REDIS_PORT)", "-n 200000", "-d 300", "--pipeline=5", "--clients=10", "--threads=5", "--run-count=2", "--hide-histogram", "--key-prefix='key:'", "--key-minimum=1", "--key-maximum=10000", "--key-pattern=S:R", "--ratio=1:3"]

VECache (KeyDB):
* Throughput: 31736.69 ops/sec, 10507.80 KB/sec
* Latency: 8.16500 ms
* used_memory_human:117.30M

➜ Documents k logs memtier-vecache-qvgq6 [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 327 secs] 0 threads: 10000000 ops, 36385 (avg: 30555) ops/sec, 11.77MB/sec (avg: 9.88MB/sec), 6.85 (avg: 8.13) msec latency

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 330 secs] 0 threads: 10000000 ops, 34522 (avg: 30286) ops/sec, 11.16MB/sec (avg: 9.79MB/sec), 7.22 (avg: 8.20) msec latency

5 Threads 10 Connections per thread 200000 Requests per client

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 8009.01 --- --- 8.13400 2681.06 Gets 24027.04 24027.04 0.00 8.12800 7925.85 Waits 0.00 --- --- 0.00000 --- Totals 32036.06 24027.04 0.00 8.12900 10606.91

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 7859.33 --- --- 8.20400 2630.95 Gets 23578.00 23578.00 0.00 8.20000 7777.73 Waits 0.00 --- --- 0.00000 --- Totals 31437.33 23578.00 0.00 8.20100 10408.68

AGGREGATED AVERAGE RESULTS (2 runs)

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 7934.17 --- --- 8.16900 2656.01 Gets 23802.52 23802.52 0.00 8.16400 7851.79 Waits 0.00 --- --- 0.00000 --- Totals 31736.69 23802.52 0.00 8.16500 10507.80


Dragonfly:
* Throughput: 23444.44 ops/sec, 7762.29 KB/sec
* Latency: 10.65200 ms
* used_memory_human:31.19MiB

➜ Documents k logs memtier-dragonfly-ws9rr [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 425 secs] 0 threads: 10000000 ops, 25440 (avg: 23479) ops/sec, 8.23MB/sec (avg: 7.59MB/sec), 10.22 (avg: 10.60) msec latency

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 430 secs] 0 threads: 10000000 ops, 31976 (avg: 23230) ops/sec, 10.34MB/sec (avg: 7.51MB/sec), 7.79 (avg: 10.71) msec latency

5 Threads 10 Connections per thread 200000 Requests per client

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 5922.85 --- --- 10.61200 1982.71 Gets 17768.56 17768.56 0.00 10.59100 5861.35 Waits 0.00 --- --- 0.00000 --- Totals 23691.41 17768.56 0.00 10.59600 7844.06

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 5799.37 --- --- 10.71700 1941.37 Gets 17398.11 17398.11 0.00 10.70500 5739.15 Waits 0.00 --- --- 0.00000 --- Totals 23197.48 17398.11 0.00 10.70800 7680.52

AGGREGATED AVERAGE RESULTS (2 runs)

Type Ops/sec Hits/sec Misses/sec Latency KB/sec

Sets 5861.11 --- --- 10.66450 1962.04 Gets 17583.33 17583.33 0.00 10.64800 5800.25 Waits 0.00 --- --- 0.00000 --- Totals 23444.44 17583.33 0.00 10.65200 7762.29

romange commented 2 years ago

Hi Jianbin,

very impressive work so far! I will tell you what I know and what I do not know.

Facts that I know:

  1. 10-25K qps is very bad result and both KeyDb and Dragonfly can do much better than that.
  2. Similarly 10ms (is it avg? 99th? ) is also an awful number for an in-memory store. You should not get there.

Now, it's hard for me to say what causes this based on the data you put here because a) I do not have hands-on experience with K8S as a deployment system b) there is some missing data

Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.

When you put multiple pods like Redis/KeyDB/DF on the same node, do not expect that they all get dedicated networking capacity: you are bounded by limitations of the underlying hardware but now it's divided between two hungry pods.

You did not write were do you benchmark them from. Is it a different node? same node? same zone?

What I would do is the following:

  1. Run a gcp instance (8-16 CPU cores) with plain ubuntu. Say 22.04. download DF/KeyDB binaries there and run it. Do not use docker images first.
  2. Run memtier on a different node. It should be at least the same size as your software under test. Rule of thumb - it should have more CPUs than your maximal --threads argument in memtier.
  3. run all your nodes in the same VPC and zone.
  4. Do not benchmark both servers concurrently on the same node! They will just compete over CPU/networking there.
  5. Run memtier without pipelining mode first. Learn your software first. --clients in 10-40 range is fine. --thread should be set in such way that a server under test would give you the highest throughput but still give you low latency. For DF it's easy to see - if it uses more than 95% of the total CPU it means it reaches the limits of the underlying machine. For KeyDB - it depends on the server-threads argument you pass (the suggest 4 but I used 8 in my tests). In any case, if you see in htop that KeyDB K cpus are at 100% then it means it won't go higher as well. If you see that avg latency is above 1ms it means that the server is overloaded and you should probably decrease --threads in your memtier.

by running on a raw GCP instance, you will learn what are the "normal" performance ranges of each server and what are the normal latencies and what are the optimal configurations for memtier.

Once you have this, you may start working your way to K8S. But I would not jump straight there. I would first run your favorite configuration above but with running servers from a container instead running a native binary. Now, there are some options here too. You can run docker run --network=host or with port mappings. I suspect that port mapping will degrade your numbers greatly but maybe --network=host will also affect them.

If you use pipelining, be ready to reduce --clients parameter to 5-10. The pipelining affects latency as well.

To summarize, signs of a good benchmark:

  1. low latency with non-pipeline mode (my rule was 99th percentile around 1ms)
  2. High CPU utilization of the instance with server under test.
  3. Try to reduce noise as much as possible and to use dedicated VMs.

Regarding (2) for some instance types you won't be able to reach full CPU utilization (i.e. 16 cores working at 100%) if they are network bound. But you should probably still see well above 1M qps on DF on n2 with 16 cores.

romange commented 2 years ago

And do not forget to drink beer!

romange commented 2 years ago

Just noticed your other memtier parameters. You can be a bit more frisky with the keyspace lengths, you use big instances, so it's ok: --key-maximum=10000000.

I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...

drinkbeer commented 2 years ago

Thank you so much for your reply. I think your suggestion of testing with GCP instance is great. I will follow the steps and try to get ideal P99 latency. I will update the results here once I finish the tests.

Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.

No. They are running on two different nodes in the same cluster, so same region (us-east1), and same zone (us-east1-d). Memtier jobs are also running on the same nodes (memtier job for keydb runs in the same node as keydb cache; memtier job for dragonfly runs in the same node as dragonfly cache).

And two memtier jobs are running sequentially to avoid saturating the network.

I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...

I first runs pure set memtier job that does only set from key:1 to key:10000000 to test set operations only; then I run pure get memtier job which has 100% hit rate as all the keys in the key space are filled; then I run mixed-set-get memtier job which has 25% of set and 75% of get, and the hit rate is 100% as well (Check the Misses/sec metric, they are 0 which means hit rate is 100%). I think the --key-pattern=S:R only affects the third memtier job (mixed ops one). Because hit rate is 100%, I think the key-pattern doesn't affect our performance a lot. But I could try --distinct-client-seed option in my test in GCP instance.

romange commented 2 years ago

I would run memtier separately from the server node as well. Not that it's impossible to get 1M running both on the same machine but it greatly affects benchmark numbers when reaching high throughput ranges.

drinkbeer commented 2 years ago

I would run memtier separately from the server node as well. Not that it's impossible to get 1M running both on the same machine but it greatly affects benchmark numbers when reaching high throughput ranges.

A good point. I originally thought that running them on the same node will save some networks. I will run them in separate instances when benchmarking with GCP instances.

ryanrussell commented 2 years ago

@drinkbeer

Any chance you have a reproducible bash script that would cover the tests you are trying between DF and Key?

These are wonderful bits of feedback; it would be interesting to make a canonical test script and deployment yaml so that different users on different platforms can execute the same test suite.

While I don't have any better feedback than what @romange provided, I could take a swing at dockerizing a test script to be more consistently reproducible and include other platforms as well in the future.

romange commented 2 years ago

@ryanrussell in terms of priority for the project, writing canonical benchmarking scripts is less important right now. You have great knowledge on how to improve the maintainability and manageability of the project. I think these areas will have the highest ROI if tackled sooner.

drinkbeer commented 2 years ago

Hey, @romange @ryanrussell , I followed your suggestions and re-ran all the tests in GCP VM instances. Dragonfly overwhelms KeyDB in P99 latency (1.24700 ms vs 1.99100 ms), throughput (578167.18 ops/sec vs 322822.64 ops/sec), memory used (2.84GiB vs 3.70G). But in the machine observability dashboard, I found that the peak CPU usage for Dragonfly is much higher than KeyDB (50% vs 10%).

My next step is to benchmark with Docker and Kubernetes. And will update the results in this issue.

Update (2022-05-09)

TL;DR

Dragonfly

===============================================================================================================================
Type         Ops/sec     Hits/sec       Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
-------------------------------------------------------------------------------------------------------------------------------
Sets        580235.73         0.00         0.00         0.52789         0.48700         1.34300         2.36700        194854.33
Gets        585411.39    585411.39         0.00         0.51945         0.47900         1.27900         2.71900        193733.96
Mixed       578167.18    433625.38         0.00         0.52565         0.48700         1.24700         1.71900        192042.35

Memory Usage: 2.84GiB

KeyDB

===============================================================================================================================
Type         Ops/sec     Hits/sec       Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
-------------------------------------------------------------------------------------------------------------------------------
Sets       288430.42         0.00         0.00         1.03785         0.89500         2.71900         7.16700         96860.48
Gets       380243.32    380243.32         0.00         0.80936         0.68700         1.48700         2.03100        125836.37
Mixed      322822.64    242116.98         0.00         0.95814         0.80700         1.99100         6.01500        107227.84

Memory Usage: 3.70G

Set up

I provisioned three VM instances:

Dragonfly:

jchome@dragonfly-worker:~/dragonfly/build-opt$ ./dragonfly --alsologtostderr
I20220609 19:01:26.782763 22665 init.cc:56] ./dragonfly running in opt mode.
I20220609 19:01:26.783080 22665 dfly_main.cc:179] maxmemory has not been specified. Deciding myself....
I20220609 19:01:26.783149 22665 dfly_main.cc:184] Found 234.06GiB available memory. Setting maxmemory to 187.25GiB
I20220609 19:01:26.783819 22666 proactor.cc:456] IORing with 1024 entries, allocated 102720 bytes, cq_entries is 2048
I20220609 19:01:26.787039 22665 proactor_pool.cc:66] Running 30 io threads
I20220609 19:01:26.797847 22665 server_family.cc:198] Data directory is "/home/jchome/dragonfly/build-opt"
I20220609 19:01:26.797976 22665 server_family.cc:122] Checking "/home/jchome/dragonfly/build-opt/dump"
I20220609 19:01:26.798053 22669 listener_interface.cc:79] sock[96] AcceptServer - listening on port 6379

KeyDB:

jchome@keydb-worker:~/KeyDB/src$ ./keydb-server --server-threads 4 --maxmemory 188G --port 6379 --protected-mode no
97236:97236:C 10 Jun 2022 02:00:30.422 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
97236:97236:C 10 Jun 2022 02:00:30.422 # KeyDB version=255.255.255, bits=64, commit=aa032d30, modified=0, pid=97236, just started
97236:97236:C 10 Jun 2022 02:00:30.422 # Configuration loaded
97236:97236:M 10 Jun 2022 02:00:30.423 * Increased maximum number of open files to 10032 (it was originally set to 1024).
97236:97236:M 10 Jun 2022 02:00:30.423 * monotonic clock: POSIX clock_gettime

                  _
               _-(+)-_
            _-- /   \ --_
         _--   /     \   --_            KeyDB  255.255.255 (aa032d30/0) 64 bit
     __--     /       \     --__
    (+) _    /         \    _ (+)       Running in standalone mode
     |   -- /           \ --   |        Port: 6379
     |     /--_   _   _--\     |        PID: 97236
     |    /     -(+)-     \    |
     |   /        |        \   |        https://docs.keydb.dev
     |  /         |         \  |
     | /          |          \ |
    (+)_ -- -- -- | -- -- -- _(+)
        --_       |       _--
            --_   |   _--
                -(+)-        KeyDB has now joined Snap! See the announcement at:  https://docs.keydb.dev/news

97236:97236:M 10 Jun 2022 02:00:30.424 # Server initialized
97236:97236:M 10 Jun 2022 02:00:30.424 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
97236:97236:M 10 Jun 2022 02:00:30.424 * Loading RDB produced by version 255.255.255
97236:97236:M 10 Jun 2022 02:00:30.424 * RDB age 11 seconds
97236:97236:M 10 Jun 2022 02:00:30.424 * RDB memory usage when created 2.97 Mb
97236:97236:M 10 Jun 2022 02:00:30.424 # Done loading RDB, keys loaded: 0, keys expired: 0.
97236:97236:M 10 Jun 2022 02:00:30.424 * DB loaded from disk: 0.000 seconds
97236:97249:M 10 Jun 2022 02:00:30.424 * Thread 0 alive.
97236:97250:M 10 Jun 2022 02:00:30.424 * Thread 1 alive.
97236:97251:M 10 Jun 2022 02:00:30.424 * Thread 2 alive.
97236:97252:M 10 Jun 2022 02:00:30.424 * Thread 3 alive.

Dragonfly

Pure Set

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs]  0 threads:    60000000 ops, 1283453 (avg:  570846) ops/sec, 420.91MB/sec (avg: 187.21MB/sec),  0.23 (avg:  0.52) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 106 secs]  0 threads:    60000000 ops, 1055065 (avg:  565307) ops/sec, 346.00MB/sec (avg: 185.39MB/sec),  0.28 (avg:  0.53) msec latency

30        Threads
10        Connections per thread
200000    Requests per client

BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       592927.98          ---          ---         0.52517         0.48700         1.27100         1.84700    199116.64
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     592927.98         0.00         0.00         0.52517         0.48700         1.27100         1.84700    199116.64

WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       567543.49          ---          ---         0.53062         0.48700         1.42300         2.67100    190592.01
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     567543.49         0.00         0.00         0.53062         0.48700         1.42300         2.67100    190592.01

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       580235.73          ---          ---         0.52789         0.48700         1.34300         2.36700    194854.33
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     580235.73         0.00         0.00         0.52789         0.48700         1.34300         2.36700    194854.33

Pure Get

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs]  0 threads:    60000000 ops, 1076305 (avg:  569751) ops/sec, 347.84MB/sec (avg: 184.13MB/sec),  0.28 (avg:  0.53) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 102 secs]  0 threads:    60000000 ops,  942359 (avg:  584836) ops/sec, 304.55MB/sec (avg: 189.01MB/sec),  0.32 (avg:  0.51) msec latency

30        Threads
10        Connections per thread
200000    Requests per client

BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       593178.84    593178.84         0.00         0.51283         0.47900         1.25500         1.79900    196304.47
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     593178.84    593178.84         0.00         0.51283         0.47900         1.25500         1.79900    196304.47

WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       577643.93    577643.93         0.00         0.52607         0.47900         1.31100         5.47100    191163.44
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     577643.93    577643.93         0.00         0.52607         0.47900         1.31100         5.47100    191163.44

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       585411.39    585411.39         0.00         0.51945         0.47900         1.27900         2.71900    193733.96
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     585411.39    585411.39         0.00         0.51945         0.47900         1.27900         2.71900    193733.96

Mixed Set-Get

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 105 secs]  0 threads:    60000000 ops,  992211 (avg:  570415) ops/sec, 321.85MB/sec (avg: 185.03MB/sec),  0.30 (avg:  0.52) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 105 secs]  0 threads:    60000000 ops, 1346899 (avg:  570342) ops/sec, 436.90MB/sec (avg: 185.00MB/sec),  0.22 (avg:  0.53) msec latency

30        Threads
10        Connections per thread
200000    Requests per client

BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       145404.28          ---          ---         0.52710         0.49500         1.24700         1.80700     48829.56
Gets       436212.84    436212.84         0.00         0.52488         0.48700         1.24700         1.80700    144358.73
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     581617.12    436212.84         0.00         0.52543         0.48700         1.24700         1.80700    193188.29

WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       143679.31          ---          ---         0.52817         0.49500         1.24700         1.67100     48250.26
Gets       431037.93    431037.93         0.00         0.52511         0.48700         1.24700         1.66300    142646.15
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     574717.24    431037.93         0.00         0.52587         0.48700         1.24700         1.66300    190896.42

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       144541.79          ---          ---         0.52764         0.49500         1.24700         1.71900     48539.91
Gets       433625.38    433625.38         0.00         0.52499         0.48700         1.24700         1.71900    143502.44
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     578167.18    433625.38         0.00         0.52565         0.48700         1.24700         1.71900    192042.35

Memory

jchome@memtier-worker:~/memtier_benchmark$ DRAGONFLY_SERVER="10.128.0.21" && echo "info memory" | nc $DRAGONFLY_SERVER 6379
$462
# Memory
used_memory:3047981240
used_memory_human:2.84GiB
used_memory_peak:3047981240
comitted_memory:3894657024
used_memory_rss:3181318144
used_memory_rss_human:2.96GiB
object_used_memory:2559986176
table_used_memory:480213552
num_buckets:12472320
num_entries:9999947
inline_keys:9999947
strval_bytes:2559986176
listpack_blobs:0
listpack_bytes:0
small_string_bytes:2559986176
maxmemory:201405674291
maxmemory_human:187.57GiB
cache_mode:store

Dashboard

image

KeyDB

Pure Set

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 211 secs]  0 threads:    60000000 ops,  696657 (avg:  283109) ops/sec, 228.47MB/sec (avg: 92.85MB/sec),  0.43 (avg:  1.06) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 203 secs]  0 threads:    60000000 ops,  605828 (avg:  295104) ops/sec, 198.68MB/sec (avg: 96.78MB/sec),  0.49 (avg:  1.02) msec latency

30        Threads
10        Connections per thread
200000    Requests per client

BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       293413.30          ---          ---         1.01653         0.83100         2.38300         6.71900     98533.82
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     293413.30         0.00         0.00         1.01653         0.83100         2.38300         6.71900     98533.82

WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       283447.54          ---          ---         1.05917         0.94300         3.02300         7.58300     95187.15
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     283447.54         0.00         0.00         1.05917         0.94300         3.02300         7.58300     95187.15

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       288430.42          ---          ---         1.03785         0.89500         2.71900         7.16700     96860.48
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     288430.42         0.00         0.00         1.03785         0.89500         2.71900         7.16700     96860.48

Pure Get

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 159 secs]  0 threads:    60000000 ops,  924097 (avg:  376554) ops/sec, 298.65MB/sec (avg: 121.69MB/sec),  0.32 (avg:  0.80) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 164 secs]  0 threads:    60000000 ops,  683873 (avg:  364556) ops/sec, 221.01MB/sec (avg: 117.82MB/sec),  0.44 (avg:  0.82) msec latency

30        Threads
10        Connections per thread
200000    Requests per client

BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       388569.90    388569.90         0.00         0.79598         0.66300         1.47100         2.67100    128591.95
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     388569.90    388569.90         0.00         0.79598         0.66300         1.47100         2.67100    128591.95

WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       371916.73    371916.73         0.00         0.82274         0.70300         1.50300         1.91100    123080.79
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     371916.73    371916.73         0.00         0.82274         0.70300         1.50300         1.91100    123080.79

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       380243.32    380243.32         0.00         0.80936         0.68700         1.48700         2.03100    125836.37
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     380243.32    380243.32         0.00         0.80936         0.68700         1.48700         2.03100    125836.37

Mixed Set-Get

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && REDIS_PORT=6379 && memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 195 secs]  0 threads:    60000000 ops,  531805 (avg:  307101) ops/sec, 172.50MB/sec (avg: 99.62MB/sec),  0.56 (avg:  0.98) msec latency

[RUN #2] Preparing benchmark client...
[RUN #2] Launching threads now...
[RUN #2 100%, 188 secs]  0 threads:    60000000 ops,  744913 (avg:  319107) ops/sec, 241.63MB/sec (avg: 103.51MB/sec),  0.40 (avg:  0.94) msec latency

30        Threads
10        Connections per thread
200000    Requests per client

BEST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        82518.16          ---          ---         0.95311         0.80700         1.88700         5.69500     27711.18
Gets       247554.48    247554.48         0.00         0.93559         0.79100         1.87100         5.59900     81924.79
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     330072.63    247554.48         0.00         0.93997         0.79100         1.87100         5.63100    109635.97

WORST RUN RESULTS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        78893.16          ---          ---         0.99005         0.83100         2.09500         6.30300     26493.84
Gets       236679.48    236679.48         0.00         0.97173         0.81500         2.07900         6.27100     78325.87
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     315572.64    236679.48         0.00         0.97631         0.82300         2.07900         6.27100    104819.71

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        80705.66          ---          ---         0.97158         0.82300         2.00700         6.07900     27102.51
Gets       242116.98    242116.98         0.00         0.95366         0.80700         1.98300         6.01500     80125.33
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     322822.64    242116.98         0.00         0.95814         0.80700         1.99100         6.01500    107227.84

Memory Usage

jchome@memtier-worker:~/memtier_benchmark$ KEYDB_SERVER="10.128.0.23" && echo "info memory" | nc $KEYDB_SERVER 6379
$1190
# Memory
used_memory:3977378208
used_memory_human:3.70G
used_memory_rss:5380833280
used_memory_rss_human:5.01G
used_memory_peak:5496877432
used_memory_peak_human:5.12G
used_memory_peak_perc:72.36%
used_memory_overhead:537329080
used_memory_startup:3113504
used_memory_dataset:3440049128
used_memory_dataset_perc:86.56%
allocator_allocated:3977771688
allocator_active:5311553536
allocator_resident:5373931520
total_system_memory:253563305984
total_system_memory_human:236.15G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:188000000000
maxmemory_human:175.09G
maxmemory_policy:noeviction
allocator_frag_ratio:1.34
allocator_frag_bytes:1333781848
allocator_rss_ratio:1.01
allocator_rss_bytes:62377984
rss_overhead_ratio:1.00
rss_overhead_bytes:6901760
mem_fragmentation_ratio:1.35
mem_fragmentation_bytes:1403517104
mem_not_counted_for_evict:1048576
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:none

Dashboard

image
romange commented 2 years ago

@drinkbeer , do not expect to reach anywhere close to 3.8M qps on GCP. AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput. I will benchmark GCP and get back to you.

You provided a great reference point with your results! It will take me a week or so. Hope it's ok.

drinkbeer commented 2 years ago

AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput.

If you take a look at the two dashboards, you can find that we are not even close to saturate the network I think. But I can check with google guys if our tests start to drop packets.

I will benchmark GCP and get back to you.

Thank you so much! I will use the results as a benchmark, continue testing with Docker and Kubernetes. Hoping that we can achieve similar results in Docker and Kubernetes (I guess Docker and Kubernetes will introduce overhead, but I am curious how many overhead it is).

It will take me a week or so. Hope it's ok.

It is totally fine. I am really appreciate your time and looking forward to your benchmarking in GCP.

romange commented 2 years ago

AWS networking capabilities are higher than any other public cloud. Having said that, I would expect for c2 to reach higher throughput.

If you take a look at the two dashboards, you can find that we are not even close to saturate the network I think. But I can check with google guys if our tests start to drop packets.

Yeah, it's not close to saturating the bandwidth. Throughput is another matter and is a bit more complicated.

  1. Clouds do not disclose this publicly, but they all put limits on Packets Per Second (PPS) for their VMs. They must, because I/O is a shared resource shared by all VMs on the server, unlike CPUs or memory that are dedicated wholly to each VM.
  2. Redis has a very naive protocol (ping pong style) that usually incurs lots of overhead due to interrupts. Essentially, a client sends a small packet and waits for a reply. A server receives a hardware interrupt from its NIC once a packet arrives. maybe spins/waits a bit to see if more packets arrive so that its interrupt handler won't handle just a single tiny packet. But nothing will arrive because the client waits for the response on the other side. So server now triggers a software interrupt for a single packet. Similarly, the send flow is triggered on a single response. (In the pipeline mode, things become better because a client sends N requests in a row that can be handled at once by the interrupt handler.)
  3. Cloud providers make various optimizations to offload network processing from the CPU of your VM to their own custom CPUs on a server. However, the quality of this job differs vastly between each cloud provider. Basically, all of them proudly say that they have 100GB network capacity, but in practice, AWS has done a great job to make it accessible for applications like DF or Memcached. But even in AWS, you will be bottlenecked on throughput rather on bandwidth when running a benchmark like this.
  4. Benchmarking and tuning software to hardware is a very complicated job. One of the recent interesting works I've read about this was this blog post: https://talawah.io/blog/extreme-http-performance-tuning-one-point-two-million/
    just to show how hard it is and how much black magic it requires.

I will benchmark GCP and get back to you.

Thank you so much! I will use the results as a benchmark, continue testing with Docker and Kubernetes. Hoping that we can achieve similar results in Docker and Kubernetes (I guess Docker and Kubernetes will introduce overhead, but I am curious how many overhead it is).

It will take me a week or so. Hope it's ok.

It is totally fine. I am really appreciate your time and looking forward to your benchmarking in GCP.

osevan commented 2 years ago

Thank you so much for your reply. I think your suggestion of testing with GCP instance is great. I will follow the steps and try to get ideal P99 latency. I will update the results here once I finish the tests.

Now, from analyzing your test setup I assume that you benchmarked both of them on the same node concurrently? am I correct? If yes, then it's a really bad idea.

No. They are running on two different nodes in the same cluster, so same region (us-east1), and same zone (us-east1-d). Memtier jobs are also running on the same nodes (memtier job for keydb runs in the same node as keydb cache; memtier job for dragonfly runs in the same node as dragonfly cache).

And two memtier jobs are running sequentially to avoid saturating the network.

I do not know if --key-pattern=S:R was a deliberate choice. I used R:R but also used --distinct-client-seed. Otherwise, each client connection goes through exactly the same route...

I first runs pure set memtier job that does only set from key:1 to key:10000000 to test set operations only; then I run pure get memtier job which has 100% hit rate as all the keys in the key space are filled; then I run mixed-set-get memtier job which has 25% of set and 75% of get, and the hit rate is 100% as well (Check the Misses/sec metric, they are 0 which means hit rate is 100%). I think the --key-pattern=S:R only affects the third memtier job (mixed ops one). Because hit rate is 100%, I think the key-pattern doesn't affect our performance a lot. But I could try --distinct-client-seed option in my test in GCP instance.

Hey, dont use high performance apps inside Kubernetes or docker or aws or azure cloud.

Pay in skilled admin with performance and security focus and inest money in bare metal server!!!

Even my 9 years old notebook did more requests - laboratory notebook.

romange commented 2 years ago

@drinkbeer preliminary results... took 2 machines c2-60 as you did Screenshot from 2022-06-12 22-42-53

fetched DF binary v0.2.0 from https://github.com/dragonflydb/dragonfly/releases/download/v0.2.0/dragonfly-x86_64.unstripped.tar.gz

ev@test-c1:~$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy

dev@test-c1:~$ uname -a
Linux test-c1 5.15.0-1008-gcp #12-Ubuntu SMP Wed Jun 1 21:29:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Discloser: it's my development image created via a packer pipeline defined here: https://github.com/romange/image-bakery

After scanning it now, I think that the only substantial change performance-wise that I did there - is turning off mitigations: sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub

besides this - it's just convenience configs and utilities.

I run only the first SET benchmark - I copy-pasted your command:

DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0
AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1395818.80          ---          ---         0.23205         0.23100         0.40700         0.55100    468742.82 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1395818.80         0.00         0.00         0.23205         0.23100         0.40700         0.55100    468742.82 

CPU usage of dragonfly: image

Already much better than your result. Lets try improving it. Rerun dragonfly with: ./dragonfly-x86_64 --logbuflevel=-1 --logtostderr --conn_use_incoming_cpu (note the last flag).

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1420922.15          ---          ---         0.22131         0.21500         0.38300         0.50300    477173.01 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1420922.15         0.00         0.00         0.22131         0.21500         0.38300         0.50300    477173.01 

but now the CPU usage is: Screenshot from 2022-06-12 23-05-58

much lower than before (4580% vs 3360%). Also p99 pretty good for both cases. Now lets increase the load a bit by increasing the number of clients in memtier to 30:

DRAGONFLY_SERVER="10.142.0.18" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=30 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1977947.42          ---          ---         0.46646         0.43900         1.07100         1.44700    664232.85 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1977947.42         0.00         0.00         0.46646         0.43900         1.07100         1.44700    664232.85 

p99.9 is too high IMHO. Lets take it down a notch: clients=10, threads=60:

============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1677309.01          ---          ---         0.35636         0.33500         0.71100         1.01500    563272.64 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1677309.01         0.00         0.00         0.35636         0.33500         0.71100         1.01500    563272.64 

pretty good - p99.9 1ms under with 1.6M QPS.

romange commented 2 years ago

Now I see you used 1 vCPU per core ratio. I use the regular 2 vCPU / core

romange commented 2 years ago

Step2: I took a plain Ubuntu image 22.04. the only thing I did before running DF was invoking ulimit -n 20000 and then ./dragonfly-x86_64 --logtostderr

client (loadtest) instance: took n2-custom-80-40960 just to be on the safe side so that we won't have bottlenecks there. I do not think it matters substantially.

DRAGONFLY_SERVER="10.142.0.20" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0                                                                                                                           
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  45 secs]  0 threads:    75000000 ops, 2052410 (avg: 1651450) ops/sec, 673.08MB/sec (avg: 541.59MB/sec),  0.36 (avg:  0.45) msec latency

50        Threads
15        Connections per thread
100000    Requests per client

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets      1788894.22          ---          ---         0.45224         0.41500         0.84700         1.51900    600745.16 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals    1788894.22         0.00         0.00         0.45224         0.41500         0.84700         1.51900    600745.16 

Seems that DF works ok on Ubuntu 22.04 out of the box. Next step - to check debian.

romange commented 2 years ago

Step 3: used BullsEye - projects/debian-cloud/global/images/debian-11-bullseye-v20220519 dragonfly: https://github.com/dragonflydb/dragonfly/releases/download/v0.2.0/dragonfly-x86_64.unstripped.tar.gz

Everything else like before. As you can see - I can confirm that Debian 11 is very bad performance-wise. I suspect that it's because to reach better performance you need at least 5.11 but I am not sure. In any case Ubuntu provides a simple alternative if performance is what you need.

Jianbin, I think there are enough data points here to continue evaluating DF.

dev@test-c1:~$ DRAGONFLY_SERVER="10.142.0.21" && REDIS_PORT=6379 && memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 100000 -d 300 --pipeline=1 --clients=15 --threads=50 --run-count=1 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0                                                                                                                           
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 184 secs]  0 threads:    75000000 ops,  435703 (avg:  406529) ops/sec, 142.89MB/sec (avg: 133.32MB/sec),  1.72 (avg:  1.84) msec latency

50        Threads
15        Connections per thread
100000    Requests per client

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets       432159.18          ---          ---         1.84153         1.45500         7.32700        18.43100    145127.38 
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals     432159.18         0.00         0.00         1.84153         1.45500         7.32700        18.43100    145127.38 
romange commented 2 years ago

@drinkbeer hey man, did you have a chance to experiment with it?

romange commented 2 years ago

@drinkbeer I am closing. Feel free to reopen if you have any questions

drinkbeer commented 2 years ago

Thank you @romange , this issue can be closed. The next step, we will probably build Dragonfly in our staging environment, and benchmark it along with Envoy proxy (which is the proxy used with KeyDB in our Prod).

Here are some results of our benchmarking. The performance of Dragonfly looks great.

(updated at July 4th, 2022)

TL:DR

We deployed Dragonfly, KeyDB on c2-standard-60 machines (30 cores, 240 GB RAM) with Ubuntu 22.04 out of box. We used memtier from Redis community for load generating and benchmarking. We tested performance and resource usage with all Set operations, all Get operations, and Set-Get Mixed operations. The conclusion is Dragonfly can achieve much higher throughput (3.5X) as well as much lower latency (14%) than KeyDB. The resource usage of Dragonfly is also impressive. Dragonfly can fully & averagely utilize the CPU multiple cores, while KeyDB cannot have more than 16 server-threads, which means it cannot fully utilize the 30 cores CPU in the machine; Dragonfly also utilizes less memory (76.19%) than KeyDB. One thing to notice about KeyDB is adding more threads does not help with performance and resource utilization.

Dragonfly KeyDB (4 threads) KeyDB (16 threads) Dragonfly (Docker)
Set Latency P99.9 (ms) 0.52700 8.63900 21.37500 0.93500
Get Latency P99.9 (ms) 0.54300 1.60700 1.56700 0.59900
Set-Get Mixed Latency P99.9 (ms) 0.57500 4.35100 7.03900 0.60700
Throughput (ops/s) ~1.4Million ~400K ~307K ~1.25Million
Memory (GB) 3.68 4.83 6.25 3.86
CPU (number of Cores) 22.8 4.25 15.23 27.97

Setup

Hardward

Dragonfly

jchome@dragonfly-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@dragonfly-worker-ubuntu:~$ uname -a
Linux dragonfly-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@dragonfly-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@dragonfly-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"

cd ~ && \
    wget https://github.com/dragonflydb/dragonfly/releases/download/v0.3.1/dragonfly-x86_64.unstripped.tar.gz && \
    tar -xvf dragonfly-x86_64.unstripped.tar.gz
jchome@dragonfly-worker-ubuntu:~$ ./dragonfly-x86_64 --logbuflevel=-1 --logtostderr --conn_use_incoming_cpu

KeyDB

jchome@keydb-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@keydb-worker-ubuntu:~$ uname -a
Linux keydb-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@keydb-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@keydb-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"

sudo apt-get update
sudo apt-get install build-essential nasm autotools-dev autoconf libjemalloc-dev tcl tcl-dev uuid-dev libcurl4-openssl-dev git
git clone https://github.com/EQ-Alpha/KeyDB.git
cd KeyDB
make distclean
make test
make
sudo make install

jchome@keydb-worker-ubuntu:~/KeyDB/src$ ./keydb-server --server-threads 4 --maxmemory 188G --port 6379 --protected-mode no

jchome@keydb-worker-ubuntu:~/KeyDB/src$ ./keydb-server --server-threads 16 --maxmemory 188G --port 6379 --protected-mode no &
[1] 102412

Memtier

jchome@memtier-worker-ubuntu:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
jchome@memtier-worker-ubuntu:~$ uname -a
Linux memtier-worker-ubuntu 5.15.0-1010-gcp #15-Ubuntu SMP Fri Jun 10 11:30:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
jchome@memtier-worker-ubuntu:~$ sudo sed -i 's/\(^GRUB_CMDLINE_LINUX=".*\)"/\1 mitigations=off"/' /etc/default/grub
jchome@memtier-worker-ubuntu:~$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX=
GRUB_CMDLINE_LINUX=" mitigations=off"
git clone https://github.com/RedisLabs/memtier_benchmark.git & cd memtier_benchmark/
https://github.com/RedisLabs/memtier_benchmark#building-and-installing

Dragonfly

Resource Usage

image

Set

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets      1423048.66          ---          ---         0.22339         0.22300         0.39100         0.52700    477887.13
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1423048.66         0.00         0.00         0.22339         0.22300         0.39100         0.52700    477887.13

Get

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets      1376543.56   1376543.56         0.00         0.22864         0.22300         0.39900         0.54300    455548.42
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1376543.56   1376543.56         0.00         0.22864         0.22300         0.39900         0.54300    455548.42

Mixed

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       328923.70          ---          ---         0.23965         0.23900         0.42300         0.57500    110458.90
Gets       986771.11    986771.11         0.00         0.23785         0.23100         0.42300         0.57500    326558.53
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1315694.82    986771.11         0.00         0.23830         0.23100         0.42300         0.57500    437017.42

Dragonfly (Docker)

Resource Usage

image

Set

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets      1305596.15          ---          ---         0.23675         0.23100         0.50300         0.93500    438444.31
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1305596.15         0.00         0.00         0.23675         0.23100         0.50300         0.93500    438444.31

Get

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets      1316826.64   1316826.64         0.00         0.23503         0.23100         0.41500         0.59900    435785.91
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1316826.64   1316826.64         0.00         0.23503         0.23100         0.41500         0.59900    435785.91

Mixed

DRAGONFLY_SERVER="10.128.0.24" && REDIS_PORT=6379 && ./memtier_benchmark -s "$DRAGONFLY_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       313124.70          ---          ---         0.24759         0.24700         0.43100         0.60700    105153.29
Gets       939374.09    939374.09         0.00         0.24553         0.23900         0.43100         0.59900    310873.12
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals    1252498.79    939374.09         0.00         0.24604         0.23900         0.43100         0.60700    416026.41

KeyDB (4 threads)

Resource Usage

image

Set

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       352644.13          ---          ---         0.87533         0.64700         3.71100         8.63900    118424.68
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     352644.13         0.00         0.00         0.87533         0.64700         3.71100         8.63900    118424.68

Get

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       462801.59    462801.59         0.00         0.66506         0.54300         1.16700         1.60700    153157.91
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     462801.59    462801.59         0.00         0.66506         0.54300         1.16700         1.60700    153157.91

Mixed

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        94887.31          ---          ---         0.80495         0.63900         2.33500         4.41500     31864.98
Gets       284661.92    284661.92         0.00         0.79218         0.61500         2.28700         4.35100     94205.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     379549.23    284661.92         0.00         0.79537         0.61500         2.30300         4.35100    126069.98

KeyDB (16 threads)

Resource Usage

image

Set

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:0

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       262139.40          ---          ---         1.14587         0.86300         8.83100        21.37500     88031.46
Gets            0.00         0.00         0.00             ---             ---             ---             ---         0.00
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     262139.40         0.00         0.00         1.14587         0.86300         8.83100        21.37500     88031.46

Get

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=0:1

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets            0.00          ---          ---             ---             ---             ---             ---         0.00
Gets       399851.71    399851.71         0.00         0.75064         0.74300         1.05500         1.56700    132325.50
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     399851.71    399851.71         0.00         0.75064         0.74300         1.05500         1.56700    132325.50

Mixed

KEYDB_SERVER="10.128.0.27" && REDIS_PORT=6379 && ./memtier_benchmark -s "$KEYDB_SERVER" -p "$REDIS_PORT" -n 200000 -d 300 --pipeline=1 --clients=10 --threads=30 --run-count=2 --hide-histogram --key-prefix='key:' --distinct-client-seed --key-pattern=R:R --ratio=1:3

AGGREGATED AVERAGE RESULTS (2 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        76642.55          ---          ---         0.98195         0.78300         4.86300         7.13500     25738.04
Gets       229927.65    229927.65         0.00         0.97991         0.78300         4.79900         7.03900     76091.44
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     306570.19    229927.65         0.00         0.98042         0.78300         4.79900         7.03900    101829.48
romange commented 2 years ago

@drinkbeer These are fantastic results! It really makes me happy 🕺🏼 to see that Dragonfly provides value! Jianbin, I would like to have a quick chat with you on discord or google meet. Will it be possible?

drinkbeer commented 2 years ago

I would like to have a quick chat with you on discord or google meet. Will it be possible?

I would love to. I sent you an invitation through your LinkedIn. Let's chat.