Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.43k stars 579 forks source link

use dump from 6.2.2 to 6.3.x #624

Open exodus1984 opened 1 year ago

exodus1984 commented 1 year ago

Hi I'm trying to run 6.3.x (currently 0, but I'm having trouble on each) with dump from 6.2.2 (save, shutdown). I'm working on docker

53:53:C 06 Apr 2023 11:41:58.635 # Configuration loaded
53:53:M 06 Apr 2023 11:41:58.636 * monotonic clock: POSIX clock_gettime
53:53:M 06 Apr 2023 11:41:58.637 * Running mode=standalone, port=6430.
53:53:M 06 Apr 2023 11:41:58.637 # Server initialized
53:53:M 06 Apr 2023 11:41:58.637 * Loading RDB produced by version 6.2.2
53:53:M 06 Apr 2023 11:41:58.637 * RDB age 3291 seconds
53:53:M 06 Apr 2023 11:41:58.637 * RDB memory usage when created 12011.74 Mb

And this is the end. keydb starts eating ram (as much as it gets - now i gave 100GB ) - When the limit given to docker is reached Memory cgroup out of memory: Killed process 1367539 (keydb-server) obraz

I turned loglevel on debug... but nothing else is logging.

Any suggestions? :|

msotheeswaran-sc commented 1 year ago

Hi @exodus1984 can you share your config, and what was happening on server after the RDB is loaded(i.e. what commands) or does this happen with no other activity on the server?

artarik commented 1 year ago

how many keys in your dump?

exodus1984 commented 1 year ago

sorry for the delay...

so...

my db

db0:keys=9299985,expires=9299984,avg_ttl=4983618562
db1:keys=135,expires=0,avg_ttl=0
db10:keys=2,expires=0,avg_ttl=0
db11:keys=1,expires=0,avg_ttl=0

8.9G dump.rdb

and my conf

bind 0.0.0.0
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
pidfile /var/run/keydb/keydb-server.pid
loglevel notice
databases 16
always-show-logo yes
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir /opt/app/data
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync true
repl-diskless-sync true
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes
server-threads 2
replica-weighting-factor 2
port 6430
enable-motd false
always-show-logo false
replica-read-only false
save ""
save 1800 1000
loglevel debug

Version 6.3.3 came out - it reminded me of the topic.

my test env

first worker with 6.2.2

54:54:C 26 May 2023 09:44:02.552 # Configuration loaded
54:54:M 26 May 2023 09:44:02.554 * monotonic clock: POSIX clock_gettime
54:54:M 26 May 2023 09:44:02.554 * Running mode=standalone, port=6430.
54:54:M 26 May 2023 09:44:02.554 # Server initialized
54:54:M 26 May 2023 09:44:02.555 * Loading RDB produced by version 6.2.2
54:54:M 26 May 2023 09:44:02.555 * RDB age 4667 seconds
54:54:M 26 May 2023 09:44:02.555 * RDB memory usage when created 25421.68 Mb
54:54:M 26 May 2023 09:44:24.802 - Accepted 127.0.0.1:45086
54:54:M 26 May 2023 09:44:25.050 - Client closed connection
54:54:M 26 May 2023 09:45:24.803 - Accepted 127.0.0.1:45134
54:54:M 26 May 2023 09:45:25.045 - Client closed connection
54:54:M 26 May 2023 09:46:24.801 - Accepted 127.0.0.1:45176
54:54:M 26 May 2023 09:46:25.053 - Client closed connection
54:54:M 26 May 2023 09:46:53.595 # Done loading RDB, keys loaded: 9302837, keys expired: 1306.
54:54:M 26 May 2023 09:46:53.595 * DB loaded from disk: 171.040 seconds
54:119:M 26 May 2023 09:46:53.595 * Thread 0 alive.
54:120:M 26 May 2023 09:46:53.595 * Thread 1 alive.
54:119:M 26 May 2023 09:46:53.596 - DB 0: 9302699 keys (9302698 volatile) in 16777216 slots HT.
54:119:M 26 May 2023 09:46:53.596 - DB 1: 135 keys (0 volatile) in 256 slots HT.
54:119:M 26 May 2023 09:46:53.596 - DB 10: 2 keys (0 volatile) in 4 slots HT.
54:119:M 26 May 2023 09:46:53.596 - DB 11: 1 keys (0 volatile) in 4 slots HT.
54:119:M 26 May 2023 09:49:03.382 # User requested shutdown...
54:119:M 26 May 2023 09:49:03.382 * Saving the final RDB snapshot before exiting.
54:119:M 26 May 2023 09:54:13.904 * DB saved on disk
54:119:M 26 May 2023 09:54:13.904 * Removing the pid file.
54:119:M 26 May 2023 09:54:13.904 # KeyDB is now ready to exit, bye bye..

second worker with 6.3.3 - i try to load rdb file - no success - loop

i try migrate data via replication

58:70:S 26 May 2023 12:55:46.016 * MASTER <-> REPLICA sync: receiving 9536979816 bytes from master to disk
[...]
58:70:S 26 May 2023 12:57:54.559 * MASTER <-> REPLICA sync: Flushing old data
58:70:S 26 May 2023 12:57:54.559 * MASTER <-> REPLICA sync: Loading DB in memory
58:70:S 26 May 2023 12:57:54.587 * Loading RDB produced by version 6.2.2
58:70:S 26 May 2023 12:57:54.587 * RDB age 441 seconds
58:70:S 26 May 2023 12:57:54.587 * RDB memory usage when created 13828.46 Mb

and loop

 INFO

# Server
redis_version:6.3.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:704d39d025097a6
redis_mode:standalone
os:Linux 5.4.0-122-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.3.0
process_id:58
process_supervised:no
run_id:8c3b1a3195af31faf3538e4977ab26446dcf579c
tcp_port:6430
server_time_usec:1685098982184607
uptime_in_seconds:776
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:7377074
executable:/etc/runit/runsvdir/default/keydb/keydb-server
config_file:/etc/keydb/redis.conf

# Clients
connected_clients:4
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:4
thread_1_clients:0

# Memory
used_memory:23641724280
used_memory_human:22.02G
used_memory_rss:2957651968
used_memory_rss_human:2.75G
used_memory_peak:23641724280
used_memory_peak_human:22.02G
used_memory_peak_perc:100.00%
used_memory_overhead:160617736
used_memory_startup:2117032
used_memory_dataset:23481106544
used_memory_dataset_perc:99.33%
allocator_allocated:2834370712
allocator_active:2894028800
allocator_resident:2969296896
total_system_memory:134765133824
total_system_memory_human:125.51G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.02
allocator_frag_bytes:59658088
allocator_rss_ratio:1.03
allocator_rss_bytes:75268096
rss_overhead_ratio:1.00
rss_overhead_bytes:-11644928
mem_fragmentation_ratio:1.04
mem_fragmentation_bytes:124135464
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:20504
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:none

# Persistence
loading:1
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1685098206
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
loading_start_time:1685098674
loading_total_bytes:9536979816
loading_rdb_used_mem:14500189088
loading_loaded_bytes:377486925
loading_loaded_perc:3.96
loading_eta_seconds:7473

# Stats
total_connections_received:14
total_commands_processed:130
instantaneous_ops_per_sec:0
total_net_input_bytes:9536983795
total_net_output_bytes:233652
instantaneous_input_kbps:89358.64
instantaneous_output_kbps:0.17
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:1
dump_payload_sanitizations:0
total_reads_processed:141
total_writes_processed:130
instantaneous_lock_contention:1
avg_lock_contention:0.828125
storage_provider_read_hits:0
storage_provider_read_misses:0

# Replication
role:slave
master_global_link_status:down
connected_masters:0
master_host:10.254.34.177
master_port:6430
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:1
slave_read_repl_offset:1
slave_repl_offset:1
master_sync_total_bytes:9536979816
master_sync_read_bytes:9536979816
master_sync_left_bytes:0
master_sync_perc:100.00
master_sync_last_io_seconds_ago:308
master_link_down_since_seconds:-1
slave_priority:100
slave_read_only:0
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:7dd33ad014fcb799396e3d224f1eca3c0586f54b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:49.264244
used_cpu_user:316.339742
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:2
long_lock_waits:375
used_cpu_sys_main_thread:40.634321
used_cpu_user_main_thread:291.532902

# Modules

# Errorstats
errorstat_ERR:count=1

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=370909,expires=370909,avg_ttl=4558909507,cached_keys=370909

# KeyDB
mvcc_depth:0

Eats all available ram - finally oom comes in

msotheeswaran-sc commented 1 year ago

I tried reproducing using your config and string data, but was unsuccessful, perhaps the issue is specific to some data type, can you provide what data type and how it is configured(i.e. if you are using hashes, how many subkeys and do those subkeys have subkey expires)

# Keyspace
db0:keys=2000000,expires=2000000,avg_ttl=1313411930630