pika 性能压不上去【sharding 模式】

debuger6 commented 3 years ago

问题描述：

3主3从 sharding 集群，配置：8c64g1.8T ssd。配合 codis 构建 pika 集群，codis-proxy 配置：2 * 8c32g。在 pika 集群数据量较少时（几个g左右），压测 get、set、hget、hset 性能都很好（set qps达到10多w，get 达到 30w），但是在同步一批线上数据后，每个节点的数据目录如下：

36G /data/pika/db
28K /data/pika/dbsync
288G    /data/pika/log

再重新压测时，发现 set 性能急剧下降，codis-proxy 和压测工具的 cpu 都上不去，且 pika 的 cpu 使用率及负载都不是很高（pika cpu 使用率 300% 左右，负载 3.5 左右）。codis 偶尔会出现后端连接中断的情况【Error from server: ERR handle response, backend conn failure, read tcp 10.21.134.xxx:28558->10.26.134.xxx:9221: read: connection reset by peer】。性能压测如下：

|># vire-benchmark -h 10.23.139.xxx -p 19000  -T 10 -n 5000000 -d 128 -k 1 -r 100000000 -e -q -t get,set,hget,hset
SET: 8034.61 requests per second
GET: 126020.77 requests per second
HSET: 19966.62 requests per second
HGET: 139089.80 requests per second

前后压测命令都一样，为何差距这么大，望解答！

kernelai commented 3 years ago

提供一下版本和pika和codis配置文件。

debuger6 commented 3 years ago

提供一下版本和pika和codis配置文件。

pika_version: 3.3.6 codis: 3.2.2 pika 配置：

# Pika port
port : 9221
# Thread Number
thread-num : 8
# Thread Pool Size
thread-pool-size : 16
# Sync Thread Number
sync-thread-num : 4
# Pika log path
log-path : /data/pika/log/
# Pika db path
db-path : /data/pika/db/
# Pika write-buffer-size
write-buffer-size : 268435456
# size of one block in arena memory allocation.
# If <= 0, a proper value is automatically calculated
# (usually 1/8 of writer-buffer-size, rounded up to a multiple of 4KB)
arena-block-size : 33554432
# Pika timeout
timeout : 60
# Requirepass
requirepass : 
# Masterauth
masterauth : 
# Userpass
userpass : 
# User Blacklist
userblacklist : 
# if this option is set to 'classic', that means pika support multiple DB, in
# this mode, option databases enable
# if this option is set to 'sharding', that means pika support multiple Table, you
# can specify slot num for each table, in this mode, option default-slot-num enable
# Pika instance mode [classic | sharding]
instance-mode : sharding
# Set the number of databases. The default database is DB 0, you can select
# a different one on a per-connection basis using SELECT <dbid> where
# dbid is a number between 0 and 'databases' - 1, limited in [1, 8]
databases : 1
# default slot number each table in sharding mode
default-slot-num : 1024
# replication num defines how many followers in a single raft group, only [0, 1, 2, 3, 4] is valid
replication-num : 0
# consensus level defines how many confirms does leader get, before commit this log to client,
#                 only [0, ...replicaiton-num] is valid
consensus-level : 0
# Dump Prefix
dump-prefix : 
# daemonize  [yes | no]
#daemonize : yes
# Dump Path
dump-path : /data/pika/dump/
# Expire-dump-days
dump-expire : 0
# pidfile Path
pidfile : ./pika.pid
# Max Connection
maxclients : 20000
# the per file size of sst to compact, default is 20M
target-file-size-base : 20971520
# Expire-logs-days
expire-logs-days : 7
# Expire-logs-nums
expire-logs-nums : 10
# Root-connection-num
root-connection-num : 2
# Slowlog-write-errorlog
slowlog-write-errorlog : no
# Slowlog-log-slower-than
slowlog-log-slower-than : 10000
# Slowlog-max-len
slowlog-max-len : 128
# Pika db sync path
db-sync-path : /data/pika/dbsync/
# db sync speed(MB) max is set to 1024MB, min is set to 0, and if below 0 or above 1024, the value will be adjust to 1024
db-sync-speed : 1024
# The slave priority
slave-priority : 100
# network interface
#network-interface : eth1
# replication
#slaveof : master-ip:master-port

# CronTask, format 1: start-end/ratio, like 02-04/60, pika will check to schedule compaction between 2 to 4 o'clock everyday
#                   if the freesize/disksize > 60%.
#           format 2: week/start-end/ratio, like 3/02-04/60, pika will check to schedule compaction between 2 to 4 o'clock
#                   every wednesday, if the freesize/disksize > 60%.
#           NOTICE: if compact-interval is set, compact-cron will be mask and disable.
#
#compact-cron : 3/02-04/60

# Compact-interval, format: interval/ratio, like 6/60, pika will check to schedule compaction every 6 hours,
#                           if the freesize/disksize > 60%. NOTICE:compact-interval is prior than compact-cron;
#compact-interval :

# the size of flow control window while sync binlog between master and slave.Default is 9000 and the maximum is 90000.
sync-window-size : 9000
# max value of connection read buffer size: configurable value 67108864(64MB) or 268435456(256MB) or 536870912(512MB)
#                                           default value is 268435456(256MB)
#                                           NOTICE: master and slave should share exactly the same value
max-conn-rbuf-size : 268435456

###################
## Critical Settings
###################
# write_binlog  [yes | no]
write-binlog : yes
# binlog file size: default is 100M,  limited in [1K, 2G]
binlog-file-size : 104857600
# Automatically triggers a small compaction according statistics
# Use the cache to store up to 'max-cache-statistic-keys' keys
# if 'max-cache-statistic-keys' set to '0', that means turn off the statistics function
# it also doesn't automatically trigger a small compact feature
max-cache-statistic-keys : 0
# When 'delete' or 'overwrite' a specific multi-data structure key 'small-compaction-threshold' times,
# a small compact is triggered automatically, default is 5000, limited in [1, 100000]
small-compaction-threshold : 5000
# If the total size of all live memtables of all the DBs exceeds
# the limit, a flush will be triggered in the next DB to which the next write
# is issued.
max-write-buffer-size : 34359738368
# The maximum number of write buffers that are built up in memory for one ColumnFamily in DB.
# The default and the minimum number is 2, so that when 1 write buffer
# is being flushed to storage, new writes can continue to the other write buffer.
# If max-write-buffer-number > 3, writing will be slowed down
# if we are writing to the last write buffer allowed.
max-write-buffer-number : 8
# Limit some command response size, like Scan, Keys*
max-client-response-size : 1073741824
# Compression type supported [snappy, zlib, lz4, zstd]
compression : snappy
# max-background-flushes: default is 1, limited in [1, 4]
max-background-flushes : 1
# max-background-compactions: default is 2, limited in [1, 8]
max-background-compactions : 2
# maximum value of Rocksdb cached open file descriptors
max-cache-files : 10000
# max_bytes_for_level_multiplier: default is 10, you can change it to 5
max-bytes-for-level-multiplier : 10
# BlockBasedTable block_size, default 4k
# block-size: 4096
# block LRU cache, default 8M, 0 to disable
# block-cache: 8388608
# whether the block cache is shared among the RocksDB instances, default is per CF
# share-block-cache: no
# whether or not index and filter blocks is stored in block cache
# cache-index-and-filter-blocks: no
# when set to yes, bloomfilter of the last level will not be built
# optimize-filters-for-hits: no
# https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#levels-target-size
# level-compaction-dynamic-level-bytes: no

codis 配置（两台 codis-proxy 实例，配置都一样）：

##################################################
#                                                #
#                  Codis-Proxy                   #
#                                                #
##################################################

# Set Codis Product Name/Auth.
product_name = "codis-pika"
product_auth = ""

# Set auth for client session
#   1. product_auth is used for auth validation among codis-dashboard,
#      codis-proxy and codis-server.
#   2. session_auth is different from product_auth, it requires clients
#      to issue AUTH <PASSWORD> before processing any other commands.
session_auth = ""

# Set bind address for admin(rpc), tcp only.
admin_addr = "10.20.xxx.xxx:11080"

# Set bind address for proxy, proto_type can be "tcp", "tcp4", "tcp6", "unix" or "unixpacket".
proto_type = "tcp4"
proxy_addr = "10.20.xxx.xxx:19000"

# Set jodis address & session timeout
#   1. jodis_name is short for jodis_coordinator_name, only accept "zookeeper" & "etcd".
#   2. jodis_addr is short for jodis_coordinator_addr
#   3. jodis_auth is short for jodis_coordinator_auth, for zookeeper/etcd, "user:password" is accepted.
#   4. proxy will be registered as node:
#        if jodis_compatible = true (not suggested):
#          /zk/codis/db_{PRODUCT_NAME}/proxy-{HASHID} (compatible with Codis2.0)
#        or else
#          /jodis/{PRODUCT_NAME}/proxy-{HASHID}
jodis_name = "etcd"
jodis_addr = "http://10.20.xxx.xxx:2379"
jodis_auth = ""
jodis_timeout = "20s"
jodis_compatible = false

# Set datacenter of proxy.
proxy_datacenter = ""

# Set max number of alive sessions.
proxy_max_clients = 20000

# Set max offheap memory size. (0 to disable)
proxy_max_offheap_size = "10gb"

# Set heap placeholder to reduce GC frequency.
proxy_heap_placeholder = "16gb"

# Proxy will ping backend redis (and clear 'MASTERDOWN' state) in a predefined interval. (0 to disable)
backend_ping_period = "5s"

# Set backend recv buffer size & timeout.
backend_recv_bufsize = "128kb"
backend_recv_timeout = "3600s"

# Set backend send buffer & timeout.
backend_send_bufsize = "128kb"
backend_send_timeout = "3600s"

# Set backend pipeline buffer size.
backend_max_pipeline = 20480

# Set backend never read replica groups, default is false
backend_primary_only = false

# Set backend parallel connections per server
backend_primary_parallel = 4
backend_replica_parallel = 4

# Set backend tcp keepalive period. (0 to disable)
backend_keepalive_period = "75s"

# Set number of databases of backend.
backend_number_databases = 1

# If there is no request from client for a long time, the connection will be closed. (0 to disable)
# Set session recv buffer size & timeout.
session_recv_bufsize = "128kb"
session_recv_timeout = "30m"

# Set session send buffer size & timeout.
session_send_bufsize = "128kb"
session_send_timeout = "3600s"

# Make sure this is higher than the max number of requests for each pipeline request, or your client may be blocked.
# Set session pipeline buffer size.
session_max_pipeline = 10000

# Set session tcp keepalive period. (0 to disable)
session_keepalive_period = "75s"

# Set session to be sensitive to failures. Default is false, instead of closing socket, proxy will send an error response to client.
session_break_on_failure = false 

# Set metrics server (such as http://localhost:28000), proxy will report json formatted metrics to specified server in a predefined period.
metrics_report_server = ""
metrics_report_period = "1s"

# Set influxdb server (such as http://localhost:8086), proxy will report metrics to influxdb.
metrics_report_influxdb_server = ""
metrics_report_influxdb_period = "1s"
metrics_report_influxdb_username = ""
metrics_report_influxdb_password = ""
metrics_report_influxdb_database = ""

# Set statsd server (such as localhost:8125), proxy will report metrics to statsd.
metrics_report_statsd_server = ""
metrics_report_statsd_period = "1s"
metrics_report_statsd_prefix = ""

kernelai commented 3 years ago

codis没有把性能压满的原因是pika的连接数太少了。backend_primary_parallel 可以设大一点，例如50。

debuger6 commented 3 years ago

codis没有把性能压满的原因是pika的连接数太少了。backend_primary_parallel 可以设大一点，例如50。

好的，我试试，还有就是 codis 经常会报【Error from server: ERR handle response, backend conn failure, read tcp 10.21.134.xxx:28558->10.26.134.xxx:9221: read: connection reset by peer】这个错误，不知道是什么原因，因为 codis 和 pika 的负载都不高，不应该导致连接断掉才对。

debuger6 commented 3 years ago

codis没有把性能压满的原因是pika的连接数太少了。backend_primary_parallel 可以设大一点，例如50。

backend_primary_parallel 调到 100 了，set 性能依然很差，而且客户端会报连接错误，如下： |># vire-benchmark -h 10.23.134.243 -p 19000 -T 10 -n 5000000 -d 512 -k 1 -r 10000000 -e -t set

Error from server: ERR handle response, backend conn failure, read tcp 10.23.134.243:57696->10.23.134.247:9221: read: connection reset by peer
Error from server: ERR handle response, backend conn failure, read tcp 10.23.134.243:57356->10.23.134.247:9221: read: connection reset by peer
Error from server: ERR handle response, backend conn reset
Error from server: ERR handle response, backend conn reset
Error from server: ERR handle response, backend conn reset
Error from server: ERR handle response, backend conn reset
Error from server: ERR handle response, backend conn reset
Error from server: ERR handle response, backend conn reset
**SET: 3515.50**

kernelai commented 3 years ago

1 查看codis backend_keepalive_period 周期和pika 配置的time out是否匹配 2 压测工具使用多线程。如T=1000

debuger6 commented 3 years ago

1 查看codis backend_keepalive_period 周期和pika 配置的time out是否匹配 2 压测工具使用多线程。如T=1000

pika 的 timeout 是 60s，codis 的 backend_keepalive_period 分别调整为 60s，30s ，测试后依旧有连接错误，如下是 codis 的日志：

2020/11/26 17:39:19 backend.go:261: [WARN] backend conn [0xc822b2ede0] to 10.20.134.247:9221, db-0 round-[1]
2020/11/26 17:39:19 backend.go:282: [WARN] backend conn [0xc822b2ede0] to 10.20.134.247:9221, db-0 reader-[0] exit
[error]: backend conn failure, read tcp 10.20.134.243:21752->10.20.134.247:9221: read: connection reset by peer
2020/11/26 17:39:19 backend.go:334: [WARN] backend conn [0xc822ac1680] to 10.20.134.247:9221, db-0 writer-[0] exit
[error]: backend conn failure, set tcp 10.20.134.243:21802: use of closed network connection
2020/11/26 17:39:19 backend.go:261: [WARN] backend conn [0xc822ac1680] to 10.20.134.247:9221, db-0 round-[1]
2020/11/26 17:39:19 backend.go:282: [WARN] backend conn [0xc822ac1680] to 10.20.134.247:9221, db-0 reader-[0] exit
[error]: backend conn failure, read tcp 10.20.134.243:21802->10.20.134.247:9221: read: connection reset by peer
2020/11/26 17:39:19 backend.go:334: [WARN] backend conn [0xc822ac0180] to 10.20.134.247:9221, db-0 writer-[0] exit
[error]: backend conn failure, set tcp 10.20.134.243:21778: use of closed network connection
2020/11/26 17:39:19 backend.go:261: [WARN] backend conn [0xc822ac0180] to 10.20.134.247:9221, db-0 round-[1]
2020/11/26 17:39:19 backend.go:282: [WARN] backend conn [0xc822ac0180] to 10.20.134.247:9221, db-0 reader-[0] exit
[error]: backend conn failure, read tcp 10.20.134.243:21778->10.20.134.247:9221: read: connection reset by peer
2020/11/26 17:39:19 backend.go:334: [WARN] backend conn [0xc822b2e360] to 10.20.134.247:9221, db-0 writer-[0] exit
[error]: backend conn failure, set tcp 10.20.134.243:21710: use of closed network connection
2020/11/26 17:39:19 backend.go:261: [WARN] backend conn [0xc822b2e360] to 10.20.134.247:9221, db-0 round-[1]
2020/11/26 17:39:19 backend.go:282: [WARN] backend conn [0xc822b2e360] to 10.20.134.247:9221, db-0 reader-[0] exit
[error]: backend conn failure, read tcp 10.20.134.243:21710->10.20.134.247:9221: read: connection reset by peer
2020/11/26 17:39:19 backend.go:334: [WARN] backend conn [0xc822b2f4a0] to 10.20.134.247:9221, db-0 writer-[0] exit
[error]: backend conn failure, set tcp 10.20.134.243:21856: use of closed network connection
2020/11/26 17:39:19 backend.go:261: [WARN] backend conn [0xc822b2f4a0] to 10.20.134.247:9221, db-0 round-[1]
2020/11/26 17:39:19 backend.go:282: [WARN] backend conn [0xc822b2f4a0] to 10.20.134.247:9221, db-0 reader-[0] exit
[error]: backend conn failure, read tcp 10.20.134.243:21856->10.20.134.247:9221: read: connection reset by peer
2020/11/26 17:39:19 backend.go:334: [WARN] backend conn [0xc822b2f7a0] to 10.20.134.247:9221, db-0 writer-[0] exit
[error]: backend conn failure, set tcp 10.20.134.243:21896: use of closed network connection
2020/11/26 17:39:19 backend.go:261: [WARN] backend conn [0xc822b2f7a0] to 10.20.134.247:9221, db-0 round-[1]
2020/11/26 17:39:19 backend.go:282: [WARN] backend conn [0xc822b2f7a0] to 10.20.134.247:9221, db-0 reader-[0] exit
[error]: backend conn failure, read tcp 10.20.134.243:21896->10.20.134.247:9221: read: connection reset by peer

debuger6 commented 3 years ago

我觉得性能上不去，应该和频繁断连建连有关系，但是原因尚不清楚

wanghenshui commented 1 year ago

我觉得性能上不去，应该和频繁断连建连有关系，但是原因尚不清楚

codis分片太多导致资源利用率低

OpenAtomFoundation / pika

pika 性能压不上去【sharding 模式】 #985

问题描述：