Tencent / Tendis

Tendis is a high-performance distributed storage system fully compatible with the Redis protocol.
http://tendis.cn
Other
2.95k stars 325 forks source link

cpu占用间歇性的100%,每次持续10分钟左右 #91

Closed perrornet closed 3 years ago

perrornet commented 3 years ago

Description

每次过几个小时tendis的cpu使用率就会到达100%, 持续时间大概在10分钟左右, 总共3台master都会出现这种情况

Context

tendis中有很多的大key, 且可能频繁回去读写这些KEY

Your Environment

集群配置

cluster-enabled true rocks.cache_index_and_filter_blocks 1 rocks.max_open_files -1 rocks.compress_type lz4

* IO/Network used: 

07:39:17 PM tps rtps wtps bread/s bwrtn/s 07:39:18 PM 126.00 126.00 0.00 13504.00 0.00 07:39:19 PM 62.38 60.40 1.98 6376.24 174.26 07:39:20 PM 91.00 91.00 0.00 9728.00 0.00 07:39:21 PM 43.00 40.00 3.00 4464.00 144.00 07:39:22 PM 103.00 103.00 0.00 10232.00 0.00 07:39:23 PM 128.00 128.00 0.00 13856.00 0.00 07:39:24 PM 75.25 75.25 0.00 8079.21 0.00 07:39:25 PM 114.00 114.00 0.00 11336.00 0.00 07:39:26 PM 539.60 107.92 431.68 10875.25 54027.72 07:39:27 PM 131.00 129.00 2.00 10888.00 24.00 Average: 141.58 97.41 44.17 9929.41 5474.78

* tendis 占用资源情况

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5727 root 20 0 95.2g 13.6g 3784 R 31.2 44.2 84:52.12 tx-worker-28_3
5702 root 20 0 95.2g 13.6g 3784 R 30.9 44.2 98:14.43 tx-worker-4_2
5724 root 20 0 95.2g 13.6g 3784 R 29.9 44.2 125:10.20 tx-worker-28_0
5697 root 20 0 95.2g 13.6g 3784 R 27.9 44.2 96:50.90 tx-worker-0_1
5714 root 20 0 95.2g 13.6g 3784 R 26.9 44.2 68:58.07 tx-worker-16_2
5701 root 20 0 95.2g 13.6g 3784 R 25.6 44.2 151:11.01 tx-worker-4_1
5726 root 20 0 95.2g 13.6g 3784 R 25.6 44.2 112:28.25 tx-worker-28_2
5718 root 20 0 95.2g 13.6g 3784 R 25.2 44.2 111:18.38 tx-worker-20_2
5725 root 20 0 95.2g 13.6g 3784 R 25.2 44.2 80:14.05 tx-worker-28_1
5708 root 20 0 95.2g 13.6g 3784 R 23.9 44.2 133:10.27 tx-worker-12_0
5712 root 20 0 95.2g 13.6g 3784 R 23.6 44.2 111:26.81 tx-worker-16_0
5717 root 20 0 95.2g 13.6g 3784 R 23.6 44.2 97:47.30 tx-worker-20_1
5719 root 20 0 95.2g 13.6g 3784 R 22.6 44.2 96:22.12 tx-worker-20_3
5696 root 20 0 95.2g 13.6g 3784 R 20.3 44.2 125:15.25 tx-worker-0_0
5709 root 20 0 95.2g 13.6g 3784 S 2.0 44.2 129:28.06 tx-worker-12_1
5716 root 20 0 95.2g 13.6g 3784 S 2.0 44.2 88:03.50 tx-worker-20_0
5715 root 20 0 95.2g 13.6g 3784 S 1.0 44.2 86:48.52 tx-worker-16_3
5711 root 20 0 95.2g 13.6g 3784 S 0.7 44.2 114:45.02 tx-worker-12_3
5720 root 20 0 95.2g 13.6g 3784 S 0.7 44.2 33:11.08 tx-worker-24_0
5739 root 20 0 95.2g 13.6g 3784 S 0.7 44.2 2:15.67 tx-io-rw-10
5699 root 20 0 95.2g 13.6g 3784 S 0.3 44.2 115:42.11 tx-worker-0_3
5700 root 20 0 95.2g 13.6g 3784 S 0.3 44.2 114:45.46 tx-worker-4_0
5710 root 20 0 95.2g 13.6g 3784 S 0.3 44.2 89:01.11 tx-worker-12_2
5713 root 20 0 95.2g 13.6g 3784 S 0.3 44.2 124:42.65 tx-worker-16_1
5722 root 20 0 95.2g 13.6g 3784 S 0.3 44.2 61:31.75 tx-worker-24_2
5723 root 20 0 95.2g 13.6g 3784 S 0.3 44.2 87:55.40 tx-worker-24_3
5737 root 20 0 95.2g 13.6g 3784 S 0.3 44.2 2:02.39 tx-io-rw-8
5326 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 0:21.40 tendisplus
5327 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 10:16.81 rocksdb:bg0
5328 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 8:30.02 rocksdb:bg1
5329 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 4:42.81 rocksdb:bg2
5330 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 1:48.64 rocksdb:bg3
5331 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 4:49.17 rocksdb:bg4
5332 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 2:19.48 rocksdb:bg5
5333 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 2:31.38 rocksdb:bg6
5334 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 2:24.38 rocksdb:bg7
5335 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 0:27.65 rocksdb:bg0
5336 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 0:20.52 rocksdb:bg1
5698 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 115:43.95 tx-worker-0_2
5703 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 89:12.03 tx-worker-4_3
5704 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 88:23.21 tx-worker-8_0
5705 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 88:07.83 tx-worker-8_1
5706 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 115:00.90 tx-worker-8_2
5707 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 130:00.95 tx-worker-8_3
5721 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 102:51.94 tx-worker-24_1
5728 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 0:02.84 tx-io-accept
5729 root 20 0 95.2g 13.6g 3784 S 0.0 44.2 1:05.51 tx-io-rw-0

* 线程名称 tx-worker-4_2  的堆栈信息

Thread 1 (process 5702):

0 0x00000000007a1e06 in rtree_szind_slab_read_fast (r_slab=, r_szind=, key=140266240921856, rtree_ctx=, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:471

1 free_fastpath (size_hint=false, size=0, ptr=0x7f924776d100) at src/jemalloc.c:2823

2 free (ptr=0x7f924776d100) at src/jemalloc.c:2866

3 0x000000000089a74a in deallocate (this=0x7f92019dd1c0, __p=) at /usr/local/include/c++/5.5.0/ext/new_allocator.h:110

4 deallocate (a=..., n=, __p=) at /usr/local/include/c++/5.5.0/bits/alloc_traits.h:517

5 _M_destroy (__size=, this=0x7f92019dd1c0) at /usr/local/include/c++/5.5.0/bits/basic_string.h:185

6 _M_dispose (this=) at /usr/local/include/c++/5.5.0/bits/basic_string.h:180

7 ~basic_string (this=, __in_chrg=) at /usr/local/include/c++/5.5.0/bits/basic_string.h:559

8 tendisplus::RocksKVCursor::next (this=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/storage/rocks/rocks_kvstore.cpp:79

9 0x0000000000b2fa8b in tendisplus::BasicDataCursor::next (this=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/storage/kvstore.cpp:226

10 0x00000000006d2f57 in tendisplus::KeysCommand::runabi:cxx11 (this=, sess=0x7f91c122e750) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/commands/debug.cpp:116

11 0x000000000068a7ac in tendisplus::Command::runSessionCmdabi:cxx11 (sess=sess@entry=0x7f91c122e750) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/commands/command.cpp:280

12 0x000000000078d04c in tendisplus::ServerEntry::processRequest (this=0x7f924762e810, sess=sess@entry=0x7f91c122e750) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/server/server_entry.cpp:1064

13 0x0000000000814ab5 in tendisplus::NetSession::processReq (this=0x7f91c122e750) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/network/network.cpp:857

14 0x0000000000814454 in tendisplus::NetSession::stepState (this=0x7f91c122e750) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/network/network.cpp:956

15 0x0000000000818c8c in operator() (__closure=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/network/network.cpp:476

16 operator() (__closure=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/network/worker_pool.h:59

17 asio_handler_invoke<tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()> > (function=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/handler_invoke_hook.hpp:68

18 invoke<tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()>, tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()> > (context=, function=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/detail/handler_invoke_helpers.hpp:37

19 dispatch<tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()>, std::allocator > (this=, f=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/impl/system_executor.hpp:38

20 operator() (this=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/detail/work_dispatcher.hpp:57

21 asio_handler_invoke<asio::detail::work_dispatcher<tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()> > > (function=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/handler_invoke_hook.hpp:68

22 invoke<asio::detail::work_dispatcher<tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()> >, asio::detail::work_dispatcher<tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()> > > (context=, function=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/detail/handler_invoke_helpers.hpp:37

23 asio::detail::executor_op<asio::detail::work_dispatcher<tendisplus::WorkerPool::schedule(fn&&) [with fn = tendisplus::NetSession::schedule()::<lambda()>]::<lambda()> >, std::allocator, asio::detail::scheduler_operation>::do_complete(void , asio::detail::scheduler_operation , const asio::error_code &, std::size_t) (owner=, base=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/detail/executor_op.hpp:69

24 0x000000000082ee1d in complete (bytes_transferred=0, ec=..., owner=0x7f9203d8b000, this=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/detail/scheduler_operation.hpp:39

25 do_run_one (ec=..., this_thread=..., lock=..., this=0x7f9203d8b000) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/detail/impl/scheduler.ipp:400

26 run (ec=..., this=0x7f9203d8b000) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/detail/impl/scheduler.ipp:153

27 run (this=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/thirdparty/asio/asio/include/asio/impl/io_context.ipp:61

28 tendisplus::WorkerPool::consumeTasks (this=0x7f9247d5cda0, idx=2) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/network/worker_pool.cpp:95

29 0x000000000082ffd5 in operator() (__closure=0x7f9203d9efe8, this=, this=, this=) at /data/tendis_bin/workspace/tendisplus-master-release/pipeline_248/tendisplus/src/tendisplus/network/worker_pool.cpp:136

30 _M_invoke<> (this=0x7f9203d9efe8) at /usr/local/include/c++/5.5.0/functional:1531

31 operator() (this=0x7f9203d9efe8) at /usr/local/include/c++/5.5.0/functional:1520

32 std::thread::_Impl<std::_Bind_simple<tendisplus::WorkerPool::startup(size_t)::<lambda()>()> >::_M_run(void) (this=0x7f9203d9efd0) at /usr/local/include/c++/5.5.0/thread:115

33 0x0000000000bb2df0 in std::execute_native_thread_routine (__p=) at ../../../../../libstdc++-v3/src/c++11/thread.cc:84

34 0x00007f9248912ea5 in start_thread () from /lib64/libpthread.so.0

35 0x00007f92481318dd in clone () from /lib64/libc.so.6

perrornet commented 3 years ago

这是运行 info后得到的结果:

# Server
redis_version:2.1.2-rocksdb-v5.13.4
redis_git_sha1:faadcd0d
redis_git_dirty:26
redis_build_id:12592486004405795862
redis_mode:standalone
os:Linux 3.10.0-1127.18.2.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:asio
gcc_version:5:5:0
process_id:5326
tcp_port:51002
uptime_in_seconds:222111
uptime_in_days:2
config_file:./tendisplus.conf

# Clients
connected_clients:185

# Memory
used_memory:-1
used_memory_human:-1
used_memory_rss:14694027264
used_memory_rss_human:14349636kB
used_memory_peak:-1
used_memory_peak_human:-1
total_system_memory:-1
total_system_memory_human:-1
used_memory_lua:-1
used_memory_vir:102267891712
used_memory_vir_human:99870988kB
used_memory_vir_peak_human:99870992kB
used_memory_rss_peak_human:15802948kB

# Persistence
loading:-1
rdb_changes_since_last_save:-1
rdb_bgsave_in_progress:-1
rdb_last_save_time:-1
rdb_last_bgsave_status:-1
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:-1
aof_rewrite_in_progress:-1
aof_rewrite_scheduled:-1
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:-1
aof_last_write_status:-1

# Stats
total_connections_received:39809
total_connections_released:39628
total_commands_processed:63168564
instantaneous_ops_per_sec:506
total_commands_cost(ns):1139917954544946
total_commands_workpool_queue_cost(ns):620121158238147
total_commands_workpool_execute_cost(ns):507841980784016
total_commands_send_packet_cost(ns):11954815522783
total_commands_execute_cost(ns):507502043197603
avg_commands_cost(ns):18045652
avg_commands_workpool_queue_cost(ns):9816926
avg_commands_workpool_execute_cost(ns):8039473
avg_commands_send_packet_cost(ns):189252
avg_commands_execute_cost(ns):8034091
commands_in_queue:140
commands_executed_in_workpool:126618111
total_stricky_packets:7466879
total_invalid_packets:0
total_net_input_bytes:14052444973
total_net_output_bytes:15209060646
instantaneous_input_kbps:129.51
instantaneous_output_kbps:74.8125
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
keyspace_hits:37728701
keyspace_misses:21465923
keyspace_wrong_versionep:0
scheduleNum:13268

# Replication
role:master
connected_slaves:0
master_repl_offset:200275427

# CPU
used_cpu_sys:4294.98
used_cpu_user:197165.94
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Keyspace
db0:keys=0,expires=0,avg_ttl=0

# Backup
backup-count:0
last-backup-time:0
current-backup-running:no

# Dataset
rocksdb.kvstore-count:10
rocksdb.total-sst-files-size:27717030777
rocksdb.live-sst-files-size:27717030777
rocksdb.estimate-live-data-size:22673558641
rocksdb.estimate-num-keys:511499509
rocksdb.total-memory:10863107968
rocksdb.cur-size-all-mem-tables:125689728
rocksdb.estimate-table-readers-mem:0
rocksdb.blockcache:10737418240
rocksdb.mem-table-flush-pending:0
rocksdb.estimate-pending-compaction-bytes:0
rocksdb.compaction-pending:0

# Compaction
current-compaction-status:stopped
time-since-lastest-compaction:222112
current-compaction-dbid:

# Levelstats
rocksdb0.level-0:bytes=3450424,num_entries=14849,num_deletions=2818,num_files=1
rocksdb0.level-5:bytes=469002186,num_entries=2084580,num_deletions=62466,num_files=7
rocksdb0.level-6:bytes=594731196,num_entries=6576169,num_deletions=0,num_files=11
rocksdb1.level-0:bytes=25423306,num_entries=14862,num_deletions=9,num_files=1
rocksdb1.level-5:bytes=527250076,num_entries=3106891,num_deletions=41265,num_files=8
rocksdb1.level-6:bytes=2382761968,num_entries=40921843,num_deletions=0,num_files=43
rocksdb2.level-0:bytes=41431155,num_entries=33777,num_deletions=366,num_files=2
rocksdb2.level-5:bytes=478135811,num_entries=960310,num_deletions=61340,num_files=10
rocksdb2.level-6:bytes=3606825639,num_entries=68635843,num_deletions=0,num_files=56
rocksdb3.level-0:bytes=15804146,num_entries=264308,num_deletions=63,num_files=1
rocksdb3.level-4:bytes=482442852,num_entries=7643036,num_deletions=45523,num_files=8
rocksdb3.level-5:bytes=905544022,num_entries=17588764,num_deletions=8642,num_files=19
rocksdb3.level-6:bytes=9346277954,num_entries=272483131,num_deletions=0,num_files=139
rocksdb4.level-0:bytes=9898789,num_entries=89414,num_deletions=630,num_files=2
rocksdb4.level-5:bytes=468441236,num_entries=2589510,num_deletions=40043,num_files=7
rocksdb4.level-6:bytes=719385551,num_entries=9439994,num_deletions=0,num_files=14
rocksdb5.level-0:bytes=26014596,num_entries=267241,num_deletions=18648,num_files=3
rocksdb5.level-5:bytes=478060652,num_entries=4412601,num_deletions=235691,num_files=8
rocksdb5.level-6:bytes=971680481,num_entries=12673642,num_deletions=0,num_files=16
rocksdb6.level-0:bytes=13115046,num_entries=75653,num_deletions=26529,num_files=2
rocksdb6.level-5:bytes=517839278,num_entries=2447670,num_deletions=151845,num_files=8
rocksdb6.level-6:bytes=993556159,num_entries=19279225,num_deletions=0,num_files=16
rocksdb7.level-5:bytes=398661738,num_entries=2844114,num_deletions=437776,num_files=6
rocksdb7.level-6:bytes=734375534,num_entries=4612628,num_deletions=0,num_files=18
rocksdb8.level-0:bytes=9815132,num_entries=85088,num_deletions=10590,num_files=2
rocksdb8.level-5:bytes=156980805,num_entries=780249,num_deletions=36174,num_files=3
rocksdb8.level-6:bytes=1163734391,num_entries=17401581,num_deletions=0,num_files=21
rocksdb9.level-0:bytes=18428798,num_entries=44748,num_deletions=298,num_files=3
rocksdb9.level-5:bytes=521193661,num_entries=1233091,num_deletions=16551,num_files=9
rocksdb9.level-6:bytes=1636768195,num_entries=14884073,num_deletions=0,num_files=27

# RocksdbBgError
TendisDev commented 3 years ago

看调用堆栈是存在keys操作,这个是很耗CPU的,确实下是否跟这个相关

#10 0x00000000006d2f57 in tendisplus::KeysCommand::run
perrornet commented 3 years ago

应该就是这个问题, 后续我再关注一下, 将keys命令删除后这几天都没有出现cpu占用过高的问题