pika 无法充分利用资源，性能压不上去

debuger6 commented 3 years ago

问题描述：机器配置：32c256g，4*ssd 压测工具：vire-benchmark 压测结果：

|># vire-benchmark -h 10.xx.134.3  -p 9221  -n 10000000 -c 1024  -d  128  -k 1 -r 10000000 -e  -t get -q
GET: 170395.48 requests per second

cpu 使用率：1200%，系统负载：14，io基本无压力。压测结果和预估差距太大，之前测试过16c128g 机器，性能还比这次的好，这点无法理解，请帮忙分析下。

pika 配置如下：

# Pika port
port : 9221
# Thread Number
thread-num : 32
# Thread Pool Size
thread-pool-size : 64
# Sync Thread Number
sync-thread-num : 6
# Pika log path
log-path : /data1/log/
# Pika db path
db-path : /data2/db/
# Pika write-buffer-size
write-buffer-size : 536870912
# size of one block in arena memory allocation.
# If <= 0, a proper value is automatically calculated
# (usually 1/8 of writer-buffer-size, rounded up to a multiple of 4KB)
arena-block-size :
# Pika timeout
timeout : 60
# Requirepass
requirepass :
# Masterauth
masterauth :
# Userpass
userpass :
# User Blacklist
userblacklist :
# if this option is set to 'classic', that means pika support multiple DB, in
# this mode, option databases enable
# if this option is set to 'sharding', that means pika support multiple Table, you
# can specify slot num for each table, in this mode, option default-slot-num enable
# Pika instance mode [classic | sharding]
instance-mode : classic
# Set the number of databases. The default database is DB 0, you can select
# a different one on a per-connection basis using SELECT <dbid> where
# dbid is a number between 0 and 'databases' - 1, limited in [1, 8]
databases : 8
# default slot number each table in sharding mode
default-slot-num : 1024
# replication num defines how many followers in a single raft group, only [0, 1, 2, 3, 4] is valid
replication-num : 0
# consensus level defines how many confirms does leader get, before commit this log to client,
#                 only [0, ...replicaiton-num] is valid
consensus-level : 0
# Dump Prefix
dump-prefix :
# daemonize  [yes | no]
#daemonize : yes
# Dump Path
dump-path : /data3/dump/
# Expire-dump-days
dump-expire : 0
# pidfile Path
pidfile : ./pika.pid
# Max Connection
maxclients : 50000
# the per file size of sst to compact, default is 20M
target-file-size-base : 67108864
# Expire-logs-days
expire-logs-days : 7
# Expire-logs-nums
expire-logs-nums : 10
# Root-connection-num
root-connection-num : 2
# Slowlog-write-errorlog
slowlog-write-errorlog : no
# Slowlog-log-slower-than
slowlog-log-slower-than : 10000
# Slowlog-max-len
slowlog-max-len : 128
# Pika db sync path
db-sync-path : /data2/dbsync/
# db sync speed(MB) max is set to 1024MB, min is set to 0, and if below 0 or above 1024, the value will be adjust to 1024
db-sync-speed : -1
# The slave priority
slave-priority : 100
# network interface
#network-interface : eth1
# replication
#slaveof : master-ip:master-port

# CronTask, format 1: start-end/ratio, like 02-04/60, pika will check to schedule compaction between 2 to 4 o'clock everyday
#                   if the freesize/disksize > 60%.
#           format 2: week/start-end/ratio, like 3/02-04/60, pika will check to schedule compaction between 2 to 4 o'clock
#                   every wednesday, if the freesize/disksize > 60%.
#           NOTICE: if compact-interval is set, compact-cron will be mask and disable.
#
#compact-cron : 3/02-04/60

# Compact-interval, format: interval/ratio, like 6/60, pika will check to schedule compaction every 6 hours,
#                           if the freesize/disksize > 60%. NOTICE:compact-interval is prior than compact-cron;
#compact-interval :

# the size of flow control window while sync binlog between master and slave.Default is 9000 and the maximum is 90000.
sync-window-size : 9000
# max value of connection read buffer size: configurable value 67108864(64MB) or 268435456(256MB) or 536870912(512MB)
#                                           default value is 268435456(256MB)
#                                           NOTICE: master and slave should share exactly the same value
max-conn-rbuf-size : 536870912

###################
## Critical Settings
###################
# write_binlog  [yes | no]
write-binlog : yes
# binlog file size: default is 100M,  limited in [1K, 2G]
binlog-file-size : 104857600
# Automatically triggers a small compaction according statistics
# Use the cache to store up to 'max-cache-statistic-keys' keys
# if 'max-cache-statistic-keys' set to '0', that means turn off the statistics function
# it also doesn't automatically trigger a small compact feature
max-cache-statistic-keys : 0
# When 'delete' or 'overwrite' a specific multi-data structure key 'small-compaction-threshold' times,
# a small compact is triggered automatically, default is 5000, limited in [1, 100000]
small-compaction-threshold : 5000
# If the total size of all live memtables of all the DBs exceeds
# the limit, a flush will be triggered in the next DB to which the next write
# is issued.
max-write-buffer-size : 137438953472
# The maximum number of write buffers that are built up in memory for one ColumnFamily in DB.
# The default and the minimum number is 2, so that when 1 write buffer
# is being flushed to storage, new writes can continue to the other write buffer.
# If max-write-buffer-number > 3, writing will be slowed down
# if we are writing to the last write buffer allowed.
max-write-buffer-number : 256
# Limit some command response size, like Scan, Keys*
max-client-response-size : 1073741824
# Compression type supported [snappy, zlib, lz4, zstd]
compression : snappy
# max-background-flushes: default is 1, limited in [1, 4]
max-background-flushes : 4
# max-background-compactions: default is 2, limited in [1, 8]
max-background-compactions : 2
# maximum value of Rocksdb cached open file descriptors
max-cache-files : 5000
# max_bytes_for_level_multiplier: default is 10, you can change it to 5
max-bytes-for-level-multiplier : 10
# BlockBasedTable block_size, default 4k
# block-size: 4096
# block LRU cache, default 8M, 0 to disable
block-cache: 68719476736 
# whether the block cache is shared among the RocksDB instances, default is per CF
# share-block-cache: no
# whether or not index and filter blocks is stored in block cache
# cache-index-and-filter-blocks: no
# when set to yes, bloomfilter of the last level will not be built
# optimize-filters-for-hits: no
# https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#levels-target-size
# level-compaction-dynamic-level-bytes: no

kernelai commented 3 years ago

尝试下减少thread pool size。这个参数表示实际do command的线程数目。当该参数设置过大，线程都去争抢一个任务。时间耗费在系统调用上面。

debuger6 commented 3 years ago

尝试下减少thread pool size。这个参数表示实际do command的线程数目。当该参数设置过大，线程都去争抢一个任务。时间耗费在系统调用上面。

调成 32 测试了下，同时开两个压测程序能够将 cpu 压到 2200% 左右，负载压到 30 左右，总的 qps 为 38w，效果还是比较明显。我看最佳实践里写的是 thread-num 最好设置成机器的核数，thread-pool-size 设置为 thread-num 的两倍。感觉 thread-pool-size 的设置可能得分场景，如果大部分情况下在内存能够完成的操作，线程池不能设置过大，如果存在较多的磁盘 io，改值可以按照最佳实践来设置。

kernelai commented 3 years ago

vire-benchmark -T 可以设置并发数目。理论上性能可以再高一点。

AlexStocks commented 1 year ago

vire-benchmark

vire-benchmark 可支持多线程 redis 压测。

OpenAtomFoundation / pika

pika 无法充分利用资源，性能压不上去 #992