Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

[CRASH] ckeysExpired == db->expireSize() #739

Open splitice opened 7 months ago

splitice commented 7 months ago

This is not the only crash generated for this assertion statement in the issues list. However the other one thats currently open is an AOF crash, this trace is background saving.

Crash report

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
380125:44:C 14 Nov 2023 01:19:22.909 # === ASSERTION FAILED ===
380125:44:C 14 Nov 2023 01:19:22.909 # ==> rdb.cpp:1372 'ckeysExpired == db->expireSize()' is not true

------ STACK TRACE ------

Backtrace:
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](rdbSaveRio(_rio*, redisDbPersistentDataSnapshot const**, int*, int, rdbSaveInfo*)+0x54d) [0x55ce14ae406d]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](rdbSaveFile(char*, redisDbPersistentDataSnapshot const**, rdbSaveInfo*)+0xc7) [0x55ce14ae4167]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](rdbSave(redisDbPersistentDataSnapshot const**, rdbSaveInfo*)+0x6e) [0x55ce14ae449e]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](rdbSaveBackgroundFork(rdbSaveInfo*)+0x104) [0x55ce14ae4b64]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](launchRdbSaveThread(unsigned long&, rdbSaveInfo*)+0x45) [0x55ce14ae4c25]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](rdbSaveBackground(rdbSaveInfo*)+0x88) [0x55ce14ae5058]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](serverCron(aeEventLoop*, long long, void*)+0xcc0) [0x55ce14b3de50]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](aeProcessEvents+0x224) [0x55ce14b445a4]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](aeMain+0x3a) [0x55ce14b48cda]
keydb-rdb-bgsave 0.0.0.0:6379 [cluster](workerThreadMain(void*)+0x7e) [0x55ce14b30ade]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fd82f759609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fd82f67e133]

Aditional information

Seems to be caused by any database making heavy use of expriring keys.

mr1jingles commented 7 months ago

Hello, I have the same problem I run keydb cluster and master node can't save db dump

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
51634:52441:C 03 Dec 2023 21:54:16.789 # === ASSERTION FAILED ===
51634:52441:C 03 Dec 2023 21:54:16.789 # ==> rdb.cpp:1372 'ckeysExpired == db->expireSize()' is not true

------ STACK TRACE ------

Backtrace:
keydb-rdb-bgsave <lister_ip>:6379 [cluster](rdbSaveRio(_rio*, redisDbPersistentDataSnapshot const**, int*, int, rdbSaveInfo*)+0x54c) [0x557db3aad2cc]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](rdbSaveFile(char*, redisDbPersistentDataSnapshot const**, rdbSaveInfo*)+0xc3) [0x557db3aad3c3]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](rdbSave(redisDbPersistentDataSnapshot const**, rdbSaveInfo*)+0x6a) [0x557db3aad6fa]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](rdbSaveBackgroundFork(rdbSaveInfo*)+0xfc) [0x557db3aaddac]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](launchRdbSaveThread(unsigned long&, rdbSaveInfo*)+0x41) [0x557db3aade61]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](rdbSaveBackground(rdbSaveInfo*)+0x84) [0x557db3aae294]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](startBgsaveForReplication(int)+0x15d) [0x557db3ac5f1d]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](+0x2f1c18) [0x557db3ab8c18]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](checkChildrenDone()+0x103) [0x557db3b04b23]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](serverCron(aeEventLoop*, long long, void*)+0x18b) [0x557db3b0546b]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](aeProcessEvents+0x224) [0x557db3b0c474]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](aeMain+0x3a) [0x557db3b10a5a]
keydb-rdb-bgsave <lister_ip>:6379 [cluster](workerThreadMain(void*)+0x7a) [0x557db3af8e1a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f733829c609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f73381c1133]

------ INFO OUTPUT ------
# Server
redis_version:6.3.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:838dba0add68499b
redis_mode:cluster
os:Linux 5.4.44-1-pve x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.3.0
process_id:51634
process_supervised:no
run_id:391a32ef01b6ab02f275d6e835ba2b66f0c75626
tcp_port:6379
server_time_usec:1701629656142514
uptime_in_seconds:2692835
uptime_in_days:31
hz:100
configured_hz:100
lru_clock:7130840
executable:/usr/bin/keydb-server
config_file:/etc/keydb/keydb-6379.conf
availability_zone:
features:cluster_mget

# Clients
connected_clients:20
cluster_connections:10
maxclients:100000
client_recent_max_input_buffer:32
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:4
thread_1_clients:5
thread_2_clients:5
thread_3_clients:7

# Memory
used_memory:198954912
used_memory_human:189.74M
used_memory_rss:185204736
used_memory_rss_human:176.62M
used_memory_peak:487311920
used_memory_peak_human:464.74M
used_memory_peak_perc:40.83%
used_memory_overhead:32242776
used_memory_startup:30741256
used_memory_dataset:166712136
used_memory_dataset_perc:99.11%
allocator_allocated:200172096
allocator_active:202801152
allocator_resident:211148800
total_system_memory:540258377728
total_system_memory_human:503.15G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:8000000000
maxmemory_human:7.45G
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.01
allocator_frag_bytes:2629056
allocator_rss_ratio:1.04
allocator_rss_bytes:8347648
rss_overhead_ratio:0.88
rss_overhead_bytes:-25944064
mem_fragmentation_ratio:0.93
mem_fragmentation_bytes:-13748248
mem_not_counted_for_evict:4
mem_replication_backlog:1048576
mem_clients_slaves:20512
mem_clients_normal:409992
mem_aof_buffer:8
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:none
available_system_memory:unavailable

# Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:5450
rdb_bgsave_in_progress:0
rdb_last_save_time:1701607058
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:15
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:4939776
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:10
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:err
aof_last_write_status:ok
aof_last_cow_size:59285504
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:5441911068
aof_base_size:41337430
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:435671
total_commands_processed:5929175
instantaneous_ops_per_sec:0
total_net_input_bytes:653897389514
total_net_output_bytes:661663452127
instantaneous_input_kbps:0.01
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:79
sync_partial_ok:0
sync_partial_err:4
expired_keys:1888
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:1100113
evicted_keys:0
keyspace_hits:21151
keyspace_misses:2034
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:10394
total_forks:13596
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:80683
dump_payload_sanitizations:0
total_reads_processed:46173190
total_writes_processed:3691823
instantaneous_lock_contention:1
avg_lock_contention:0.562500
storage_provider_read_hits:0
storage_provider_read_misses:0

# Replication
role:master
connected_slaves:1
slave0:ip=<slave_ip>,port=6379,state=wait_bgsave,offset=0,lag=0
master_failover_state:no-failover
master_replid:8b70727c7a6fdc09a7441dd0d3ae0d7584bd3c36
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:653668905680
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:653668905287
repl_backlog_histlen:394

# CPU
used_cpu_sys:0.095338
used_cpu_user:0.508471
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:4
long_lock_waits:1593459
used_cpu_sys_main_thread:0.095338
used_cpu_user_main_thread:0.508471
splitice commented 7 months ago

I will note that its possible for later background saves to begin to work, therefore this looks to be either: a) a missing increment of ckeysExpired; or b) a state issue

I need some assistance though finding it, its a bit beyond me.

seii888 commented 7 months ago

seii.88