OpenAtomFoundation / pika

Pika is a Redis-Compatible database developed by Qihoo's infrastructure team.
BSD 3-Clause "New" or "Revised" License
5.88k stars 1.19k forks source link

During Pika pressure testing, there was a queue in the log #2882

Open yakexi07 opened 2 months ago

yakexi07 commented 2 months ago

Is this a regression?

Yes

Description

使用redis-cli进行压测 time /opt/redis/redis-src/src/redis-benchmark -a xxx -p 9221 -c 100 -r 500000 -n 1000000 -d 128 -t SET,GET,HSET -l --csv ,(redis-benchmark版本为6.26)此命令会持续进行压测,压测是想得到set、get、hset命令高iops的情况下,pika是否能支持较长时间,应对线上突发大流量,大约5分钟之后,pika日志里面会出现命令排队的情况,可以持续复现

E20240826 16:06:57.641274 2084 pika_client_conn.cc:239] ip_port: 172.16.186.22:23334, db: db0, command: "HSET" "myhash" "element:000000486796" "=JkfE^5WSViF:VA1V>6fXU_jBXFe^EHMAfnRC^NN;5ak=ccRS^mW<cRegG;km;CCQQg9P9RkhR7FWUeEh:53OS9n5ed^CI930G5Km;YG3DMJOCc4VY6oY=Q;;ii7", command_size: 161, arguments: 4, total_time(ms): 30, before_queue_time(ms): 0, queue_time(ms): 21, process_time(ms): 9, cmd_time(ms): 0

偶尔排队时间会达到60ms,麻烦帮忙看下是否配置有可优化项,或者压测模型选择的有问题?

Please provide a link to a minimal reproduction of the bug

No response

Screenshots or videos

image

Please provide the version you discovered this bug in (check about page for version information)

测试机器:阿里云ecs.c7.2xlarge,8核心16G内存。CentOS Stream 9 64位系统,200GB ESSD PL0,iops 4200(压测的时候iops大概300左右,CPU和内存使用均正常)
pika版本:3.5.5 
pika配置如下:
port : 9221
thread-num : 6
rtc-cache-read : yes
thread-pool-size: 12
slow-cmd-pool : no
slow-cmd-thread-pool-size : 2
admin-thread-pool-size : 2
slow-cmd-list : info
admin-cmd-list : info, ping, monitor
sync-thread-num : 9
sync-binlog-thread-num : 1
log-path : ./log/
log-retention-time : 7
db-path : ./db/
write-buffer-size : 512M
arena-block-size : 64M
timeout : 0
requirepass : xxx
masterauth : xxx
userpass : xxx
userblacklist : FLUSHALL, FLUSHDB, SHUTDOWN, KEYS, CONFIG
instance-mode : classic
databases : 1
replication-num : 0
consensus-level : 0
dump-prefix : backup-
dump-path : ./dump/
dump-expire : 5
pidfile : ./pika.pid
maxclients : 20000
target-file-size-base : 20M
expire-logs-days : 7
expire-logs-nums : 20
root-connection-num : 2
slowlog-write-errorlog : yes
slowlog-log-slower-than : 30000
slowlog-max-len : 128
db-sync-path : ./dbsync/
db-sync-speed : -1
slave-priority : 100
compact-cron : 03-04/30
disable_auto_compactions : false
max-subcompactions : 3 
sync-window-size : 9000
max-conn-rbuf-size : 256M 
write-binlog : yes
binlog-file-size : 500M
max-cache-statistic-keys : 5000
small-compaction-threshold : 5000
small-compaction-duration-threshold : 10000
max-write-buffer-size : 2G
max-write-buffer-num : 3
max-client-response-size : 100M
compression : snappy
max-background-jobs : 8
max-background-flushes : 4
max-background-compactions : 4
delayed-write-rate : 0
max-compaction-bytes : 5000
max-cache-files : 50000
max-bytes-for-level-multiplier : 10
slotmigrate : no
slotmigrate-thread-num : 8
thread-migrate-keys-num : 64
block-cache: 5G 
share-block-cache: yes
default-slot-num : 128
cache-index-and-filter-blocks: yes
level-compaction-dynamic-level-bytes: yes
rate-limiter-bandwidth : 1099511627776
throttle-bytes-per-second : 207200000
rsync-timeout-ms : 1000
max-rsync-parallel-num : 4
cache-num : 16
cache-model : 1
cache-type: string, set, zset, list, hash, bit
zset-cache-field-num-per-key : 512
zset-cache-start-direction : 0
cache-maxmemory : 10G
cache-maxmemory-policy : 1
cache-maxmemory-samples: 5
cache-lfu-decay-time: 1
rename-command : FLUSHDB tiflushdb
rename-command : FLUSHALL tiflushall
internal-used-unfinished-full-sync :

Anything else?

No response

chejinge commented 2 months ago

是的队列阻塞了 thread-pool-size: 12 可以调整下 你是8核 调成8就行 max-background-jobs : 5 max-background-flushes :2 max-background-compactions : 3

yakexi07 commented 2 months ago

是的队列阻塞了 thread-pool-size: 12 可以调整下 你是8核 调成8就行 max-background-jobs : 5 max-background-flushes :2 max-background-compactions : 3

尝试按照你说的调整了配置,还是会出现一样的问题。 同时采用如下配置的 336版本,测试未出现问题 ,麻烦再帮忙看看 是否新版本有特殊的设置

port : 9221 thread-num : 8 thread-pool-size : 16 sync-thread-num : 2 log-path : ./log/ db-path : ./db/ write-buffer-size : 536870912 #512M arena-block-size : 67108864 #64M timeout : 0 requirepass : xxx masterauth : xxx userpass : xxx userblacklist : FLUSHALL, FLUSHDB, SHUTDOWN, CONFIG, KEYS, slaveof, client instance-mode : classic databases : 8 default-slot-num : 1024 replication-num : 0 consensus-level : 0 dump-prefix : backup- daemonize : no dump-path : ./dump/ dump-expire : 5 pidfile : ./pika.pid maxclients : 20000 target-file-size-base : 52428800 #50M,level 1大小 expire-logs-days : 7 expire-logs-nums : 200 root-connection-num : 2 slowlog-write-errorlog : yes slowlog-log-slower-than : 20000 slowlog-max-len : 128 db-sync-path : ./dbsync/ db-sync-speed : -1 slave-priority : 100 compact-cron : 03-04/30 compact-interval : sync-window-size : 9000 max-conn-rbuf-size : 268435456 write-binlog : yes binlog-file-size : 104857600 max-cache-statistic-keys : 5000 small-compaction-threshold : 5000 max-write-buffer-size : 10737418240 #10GB max-write-buffer-number : 2 max-client-response-size : 1073741824 compression : snappy max-background-flushes : 1 max-background-compactions : 2 max-cache-files : 10000 max-bytes-for-level-multiplier : 10

yakexi07 commented 2 months ago

是的队列阻塞了 thread-pool-size: 12 可以调整下 你是8核 调成8就行 max-background-jobs : 5 max-background-flushes :2 max-background-compactions : 3

355版本启动的时候,日志中有 Worker queue limit is 1766 ,这个可以说明 排队的队列最大是1766吗? 这个数值在哪儿可以修改,或者说怎么样获取的,我从你们 相关的文档https://github.com/OpenAtomFoundation/pika/blob/4bfb5e788a1a6fd92dff20d5f1fa24e26ee3d128/docs/ops/SetUpDevEnvironment.md?plain=1#L65 里面看到,你们启动的时候这个数值是2万多,麻烦看看是否与这个数值有关系

Issues-translate-bot commented 2 months ago

Bot detected the issue body's language is not English, translate it automatically.


Yes, the queue is blocked. thread-pool-size: 12. You can adjust it. You have 8 cores. Just adjust it to 8. max-background-jobs: 5 max-background-flushes: 2 max-background-compactions: 3

When version 355 is started, there is Worker queue limit is 1766 in the log. Does this mean that the maximum queue is 1766? Where can this value be modified, or how to obtain it? I found it from your relevant document https://github.com/OpenAtomFoundation/pika/blob/4bfb5e788a1a6fd92dff20d5f1fa24e26ee3d128/docs/ops/SetUpDevEnvironment.md?plain=1#L65 I see that when you started, the value was more than 20,000. Please check if it has anything to do with this value.

chejinge commented 2 months ago

您可以把您当前的配置在发下吗 config get *

Issues-translate-bot commented 2 months ago

Bot detected the issue body's language is not English, translate it automatically.


Can you post your current configuration? config get*

yakexi07 commented 2 months ago

提供一些可供参考的场景,同样的压测条件压测了 redis 6.26 和pika 3.5.5 ,发现pika hset命令的qps比较低,同时日志里面大量的出现了排队的情况,怀疑是hset命令大量的写入导致了排队

image

E20240902 16:19:23.452486 91162 pika_client_conn.cc:239] ip_port: 172.16.26.50:26136, db: db0, command: "HSET" "gamex:0:profile:conditionRecords:45:0:d53dbccd-6053-4802-a7c4-184cb4adb3df" "23" "\n\b\b\xca\x01\x12\x03\b\xe3\x0c", command_size: 93, arguments: 4, total_time(ms): 54, before_queue_time(ms): 0, queue_time(ms): 38, process_time(ms): 16, cmd_time(ms): 0

您可以把您当前的配置在发下吗 config get *

127.0.0.1:9221> config get * 1) "port" 2) "9221" 3) "thread-num" 4) "12" 5) "thread-pool-size" 6) "16" 7) "slow-cmd-thread-pool-size" 8) "1" 9) "admin-thread-pool-size" 10) "2" 11) "userblacklist" 12) "flushall, flushdb, shutdown, keys, config" 13) "slow-cmd-list" 14) "" 15) "admin-cmd-list" 16) "monitor,ping,info" 17) "sync-thread-num" 18) "8" 19) "sync-binlog-thread-num" 20) "1" 21) "log-path" 22) "./log/" 23) "db-path" 24) "./db/" 25) "maxmemory" 26) "268435456" 27) "write-buffer-size" 28) "268435456" 29) "arena-block-size" 30) "33554432" 31) "max-write-buffer-num" 32) "2" 33) "timeout" 34) "0" 35) "requirepass" 36) "xxx" 37) "masterauth" 38) "xxx" 39) "userpass" 40) "xxx" 41) "instance-mode" 42) "classic" 43) "databases" 44) "1" 45) "daemonize" 46) "no" 47) "slotmigrate" 48) "no" 49) "slow-cmd-pool" 50) "no" 51) "slotmigrate-thread-num" 52) "8" 53) "thread-migrate-keys-num" 54) "64" 55) "dump-path" 56) "./dump/" 57) "dump-expire" 58) "5" 59) "dump-prefix" 60) "backup-" 61) "pidfile" 62) "./pika.pid" 63) "maxclients" 64) "20000" 65) "target-file-size-base" 66) "52428800" 67) "max-cache-statistic-keys" 68) "2000" 69) "small-compaction-threshold" 70) "5000" 71) "small-compaction-duration-threshold" 72) "10000" 73) "max-background-flushes" 74) "4" 75) "max-background-compactions" 76) "4" 77) "max-background-jobs" 78) "8" 79) "max-cache-files" 80) "5000" 81) "max-bytes-for-level-multiplier" 82) "10" 83) "block-size" 84) "4096" 85) "block-cache" 86) "15" 87) "share-block-cache" 88) "yes" 89) "cache-index-and-filter-blocks" 90) "yes" 91) "optimize-filters-for-hits" 92) "no" 93) "level-compaction-dynamic-level-bytes" 94) "yes" 95) "expire-logs-days" 96) "7" 97) "expire-logs-nums" 98) "20" 99) "root-connection-num" 100) "2" 101) "slowlog-write-errorlog" 102) "yes" 103) "slowlog-log-slower-than" 104) "30000" 105) "slowlog-max-len" 106) "128" 107) "write-binlog" 108) "yes" 109) "binlog-file-size" 110) "524288000" 111) "max-write-buffer-size" 112) "6442450944" 113) "max-client-response-size" 114) "104857600" 115) "compression" 116) "snappy" 117) "db-sync-path" 118) "./dbsync/" 119) "db-sync-speed" 120) "1024" 121) "compact-cron" 122) "03-04/30" 123) "compact-interval" 124) "" 125) "disable_auto_compactions" 126) "false" 127) "network-interface" 128) "" 129) "slaveof" 130) "" 131) "slave-priority" 132) "100" 133) "save" 134) "" 135) "appendonly" 136) "no" 137) "sync-window-size" 138) "9000" 139) "max-conn-rbuf-size" 140) "67108864" 141) "replication-num" 142) "0" 143) "consensus-level" 144) "0" 145) "rate-limiter-mode" 146) "1" 147) "rate-limiter-bandwidth" 148) "1099511627776" 149) "delayed-write-rate" 150) "0" 151) "max-compaction-bytes" 152) "1310720000" 153) "rate-limiter-refill-period-us" 154) "100000" 155) "rate-limiter-fairness" 156) "10" 157) "rate-limiter-auto-tuned" 158) "yes" 159) "run-id" 160) "e96f227e6214adcbec36ea0817271c802b35df85" 161) "blob-cache" 162) "0" 163) "blob-compression-type" 164) "none" 165) "blob-file-size" 166) "268435456" 167) "blob-garbage-collection-age-cutoff" 168) "0.250000" 169) "blob-garbage-collection-force-threshold" 170) "1.000000" 171) "blob-num-shard-bits" 172) "0" 173) "compression-per-level" 174) "" 175) "default-slot-num" 176) "128" 177) "enable-blob-files" 178) "no" 179) "enable-blob-garbage-collection" 180) "no" 181) "loglevel" 182) "" 183) "min-blob-size" 184) "4096" 185) "pin_l0_filter_and_index_blocks_in_cache" 186) "no" 187) "slave-read-only" 188) "yes" 189) "throttle-bytes-per-second" 190) "207200000" 191) "max-rsync-parallel-num" 192) "4" 193) "replication-id" 194) "" 195) "cache-num" 196) "16" 197) "cache-model" 198) "0" 199) "cache-type" 200) "" 201) "zset-cache-start-direction" 202) "0" 203) "zset-cache-field-num-per-key" 204) "512" 205) "cache-maxmemory" 206) "10737418240" 207) "cache-maxmemory-policy" 208) "1" 209) "cache-maxmemory-samples" 210) "5" 211) "cache-lfu-decay-time" 212) "1" 213) "acl-pubsub-default" 214) "resetchannels"