Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.29k stars 570 forks source link

[BUG] Random crash using 6 servers cluster #640

Open crystianluis opened 1 year ago

crystianluis commented 1 year ago

KeyDB server just crash running in cluster with 6 servers with Master and Slave, the Slave automatic starts in Master role.

Apr 19 10:41:22 debian-keydb-01 kernel: [429676.003520] keydb-server[15509]: segfault at 80c0 ip 00005620b63c0267 sp 00007f82d4bfac30 error 4 in keydb-server[5620b6339000+487000] Apr 19 10:41:22 debian-keydb-01 kernel: [429676.006407] Code: 48 81 ec 38 01 00 00 48 89 7c 24 58 4c 8d b4 24 a0 00 00 00 64 48 8b 04 25 28 00 00 00 48 89 84 24 28 01 00 00 31 c0 4c 89 f0 <48> 8b 96 c0 00 00 00 83 e2 01 0f 85 f1 04 00 00 48 8b 93 c0 00 00

Attached logs from keydb

logs.txt

msotheeswaran-sc commented 1 year ago

Hi @crystianluis, could you provide more context such as your config, the command that you are using, the size of your data, how you are creating the cluster, etc. Creating a cluster using the create-cluster script found in utils/create-cluster does not just crash on its own, so there must be something specific to your use case that is causing this.

crystianluis commented 1 year ago

The confs, logs and commands are in logs.txt, did you check?

msotheeswaran-sc commented 1 year ago

The configuration and the cluster topology are not in logs.txt.

crystianluis commented 1 year ago

192.168.0.61:30000> CLUSTER INFO cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 cluster_size:3 cluster_current_epoch:179 cluster_my_epoch:175 cluster_stats_messages_ping_sent:2141379 cluster_stats_messages_pong_sent:2152255 cluster_stats_messages_fail_sent:65 cluster_stats_messages_auth-req_sent:5 cluster_stats_messages_auth-ack_sent:4 cluster_stats_messages_update_sent:6 cluster_stats_messages_sent:4293714 cluster_stats_messages_ping_received:2152251 cluster_stats_messages_pong_received:2141284 cluster_stats_messages_fail_received:34 cluster_stats_messages_auth-req_received:7 cluster_stats_messages_auth-ack_received:2 cluster_stats_messages_update_received:5 cluster_stats_messages_received:4293583

192.168.0.61:30000> CLUSTER NODES 9ee39aeab46bf4ec4f34a14f2af45097c491cf0e 192.168.0.61:30000@40000 myself,master - 0 1684338051000 175 connected 0-5460 97147caa96441a7f41d7646f8f34b090706c8782 192.168.0.62:30000@40000 master - 0 1684338052438 179 connected 5461-10922 15abc5b3a93f25130d8c38ed0eeb13a34b815a19 192.168.0.66:30000@40000 slave 97147caa96441a7f41d7646f8f34b090706c8782 0 1684338052000 179 connected 35a6871dfd60280c7728282aa5f33675d48f03d0 192.168.0.65:30000@40000 slave 9ee39aeab46bf4ec4f34a14f2af45097c491cf0e 0 1684338053000 175 connected cb855611162e63e1d1799222a4915db55d6de758 192.168.0.64:30000@40000 master - 0 1684338053452 39 connected 10923-16383 45121c156d9c05f75131d5dae48f02a766e3e352 192.168.0.63:30000@40000 slave cb855611162e63e1d1799222a4915db55d6de758 0 1684338053554 39 connected

192.168.0.61:30000> CLUSTER SLOTS 1) 1) (integer) 0 2) (integer) 5460 3) 1) "192.168.0.61" 2) (integer) 30000 3) "9ee39aeab46bf4ec4f34a14f2af45097c491cf0e" 4) 1) "192.168.0.65" 2) (integer) 30000 3) "35a6871dfd60280c7728282aa5f33675d48f03d0" 2) 1) (integer) 5461 2) (integer) 10922 3) 1) "192.168.0.62" 2) (integer) 30000 3) "97147caa96441a7f41d7646f8f34b090706c8782" 4) 1) "192.168.0.66" 2) (integer) 30000 3) "15abc5b3a93f25130d8c38ed0eeb13a34b815a19" 3) 1) (integer) 10923 2) (integer) 16383 3) 1) "192.168.0.64" 2) (integer) 30000 3) "cb855611162e63e1d1799222a4915db55d6de758" 4) 1) "192.168.0.63" 2) (integer) 30000 3) "45121c156d9c05f75131d5dae48f02a766e3e352"

crystianluis commented 1 year ago

CONFIG GET * CONFIG GET.TXT